In the previous blog, we used pytoch to manually implement lenet-5. Because only one of the two cards on the machine was used during training, we wanted to use two graphics cards to train our network at the same time. Of course, lenet, a neural network with low layers and less data sets, does not need two cards to train, Here’s just how to call two cards.
There are three ways to find multi card training on the network:
The first method is pytorch’s own multi card training method, but it can also be seen from the name of the method that it is not completely parallel computing, but the data is calculated in parallel on two cards. The saving of the model and the calculation of loss are concentrated on one of several cards, which also leads to the inconsistency of the video memory occupation of the two cards in this method.
The second method is a third-party package developed by others. It solves the problem that loss’s calculation is not parallel. In addition, it also contains many other easy-to-use methods. Its functions are released hereGitHub linkInterested students can go and have a look.
The third method is the most complex of these methods. For this method, each GPU will calculate the derivation of the data assigned to it, and then pass the result to the next GPU. This is different from dataparallel, which gathers all data into one GPU for derivation, calculates loss and updates parameters.
Here I choose the first method for parallel computing
Parallel computing related code
First, you need to check whether there are multiple graphics cards on the machine
USE_MULTI_GPU = True #Check whether the machine has multiple graphics cards if USE_MULTI_GPU and torch.cuda.device_count() > 1: MULTI_GPU = True os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1" device_ids = [0, 1] else: MULTI_GPU = False device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1"Is to number the GPU in the machine
The next step is to read the model
net = LeNet() if MULTI_GPU: net = nn.DataParallel(net,device_ids=device_ids) net.to(device)
There are many differences between this card and a single card
Next is the definition of optimizer and scheduler
optimizer=optim.Adam(net.parameters(), lr=1e-3) scheduler = StepLR(optimizer, step_size=100, gamma=0.1) if MULTI_GPU: optimizer = nn.DataParallel(optimizer, device_ids=device_ids) scheduler = nn.DataParallel(scheduler, device_ids=device_ids)
Because the definitions of optimizer and scheduler are changed, they are also different in later calls
For example, read a piece of code of learning rate:
Detailed code can be found inMy GitHub warehousenotice
The training process is the same as that of a single card. Here we show the occupation of two cards
You can see that both cards are occupied, which shows that our code plays a role, but you can also see that there is an obvious difference in the occupation of the two cards. That is, the dataparallel mentioned earlier is only parallel in data, but not parallel in loss calculation and other operations
If there are mistakes and suggestions in the article, you can point them out