Pytorch – VGg network construction

Time:2022-1-16

catalogue

VGg introduction

What’s the power of VGg?

What is receptive field?

Calculation formula

Question 1:

Question 2:

network diagram

Pytorch builds VGg network

🚓1. model.py

Magical place to deal with

🚓2. train.py

3. predict.py

be careful


VGg introduction

VGg was established by the famous research group of Oxford University in 2014VGGVisual Geometry Group)Put forward,

(address of thesis:https://arxiv.org/abs/1409.1556

In the Imagenet competition of the yearLocalization taskFirst and Classification taskproxime accessit. (it can be said to be very powerful)

What’s the power of VGg?

By stacking multiple small convolution kernels instead of large-scale convolution kernels, the training parameters can be reduced and the same receptive field can be guaranteed

What is receptive field?

Determining the area size of the input layer corresponding to an element in the output result of a layer is calledReceptive fieldreceptive field

Simply put, a cell on the output feature map corresponds to the area size on the input layer (upper layer)

Pytorch - VGg network construction

For example, in the above figure, the receptive field of maxpool 1 is 2 (meaning that one grid on the upper layer corresponds to two grids on the lower layer)

Conv1 receptive field was 5

Calculation formula

The calculation formula of our receptive field is:
                                Pytorch - VGg network construction 

F (I + 1) is the receptive field of layer I + 1
Stride is the step length of the i-th layer
Ksize is the convolution kernel or pooled kernel size
 

Question 1:

Stack two 3 × The convolution kernel of 3 replaces the convolution kernel of 5×5 and stacks three 3 × 3 instead of 7×7.

(In VGg network, the convolution stride defaults to 1)

Are the receptive fields the same before and after substitution?

According to the formula

(first floor)Feature map: F(1) = 1 

(second floor)Conv3x3(3):Pytorch - VGg network constructionPytorch - VGg network construction

(third floor)Conv3x3(2):  Pytorch - VGg network construction

(5 × 5 convolution nucleus receptive field)

(fourth floor)Conv3x3(1): Pytorch - VGg network construction

(7 × 7 convolution nucleus (receptive field)
 

2 3 × threeConvolution kernel and a5×5Convolution kernelSame receptive field

Prove that it can passStack two 3 × The convolution kernel of 3 replaces the convolution kernel of 5×5 and stacks three 3 × The convolution kernel of 3 replaces the convolution kernel of 7×7

Question 2:

Stack 3 × 3 is the training parameter really reduced after convolution kernel? 

Note: number of CNN parameters = convolution kernel size × Convolution kernel depth × Number of convolution kernel groups = convolution kernel size × Input characteristic matrix depth × Output characteristic matrix depth
It is assumed that the depth of the input characteristic matrix = the depth of the output characteristic matrix = C

Use 7 × 7 number of parameters required for convolution kernel:

Pytorch - VGg network construction

Stack three 3 × Number of parameters required for convolution kernel of 3:

Pytorch - VGg network construction

Obviously 27 is less than 49

network diagram

VGg network has multiple versions,

We generally use vgg16(16 means 16 layers = 12 convolution layers + 4 full connection layers

The network structure is as follows:

Pytorch - VGg network construction

Looking at the picture and calculation, we can know that after 3 × 3 the size of the characteristic matrix of convolution does not change:

 out =(in −F+2P)/S+1=(in ​−3+2)/1+1= in 

Out = same size as in

Pytorch builds VGg network

VGg network is divided intoConvolution layer feature extractionAndThe whole connection layer is classifiedThese two modules

🚓1. model.py

import torch.nn as nn
import torch

class VGG(nn.Module):
    def __init__(self, features, num_classes=1000, init_weights=False):
        super(VGG, self).__init__()
        self. features = features 			#  Convolution layer feature extraction
        self. classifier = nn. Sequential( 	#  The whole connection layer is classified
            nn.Dropout(p=0.5),
            nn.Linear(512*7*7, 2048),
            nn.ReLU(True),
            nn.Dropout(p=0.5),
            nn.Linear(2048, 2048),
            nn.ReLU(True),
            nn.Linear(2048, num_classes)
        )
        if init_weights:
            self._ initialize_ Weights() # initialize weights

    def forward(self, x):
        # N x 3 x 224 x 224
        x = self.features(x)
        # N x 512 x 7 x 7
        x = torch.flatten(x, start_dim=1)
        # N x 512*7*7
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                nn.init.xavier_uniform_(m.weight)
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.xavier_uniform_(m.weight)
                # nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

Magical place to deal with

Pytorch - VGg network construction

#VGg network model configuration list. The number represents the number of convolution cores,'m 'represents the maximum pooling layer
cfgs = {
    'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], 											#  Model a
    'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], 									#  Model B
    'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'], 					#  Model d
    'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],  	#  Model e
}

#Convolution layer feature extraction
def make_ Features (CFG: list): # the parameter list of a specific model is passed in
    layers = []
    in_ channels = 3 		#  Input original image (RGB three channel)
    for v in cfg:
        #If it is a maximum pool layer, pool it
        if v == "M":
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        #Otherwise, it is the convolution layer
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            layers += [conv2d, nn.ReLU(True)]
            in_channels = v
    return nn. Sequential (* layers) # single asterisk (*) imports parameters as tuples


Def VGg (model_name = "vgg16", * * kwargs): # double asterisk (* *) import parameters as a dictionary
    try:
        cfg = cfgs[model_name]
    except:
        print("Warning: model number {} not in cfgs dict!".format(model_name))
        exit(-1)
    Model = VGg (make_features (CFG), * * kwargs) # * * kwargs is the dictionary data you passed in
    return model

🚓2. train.py

andPytorch – alexnet – training flower classification data set_ heart_ 6662 blog – blogSame as (data or flower data)

import os
import json

import torch
import torch.nn as nn
from torchvision import transforms, datasets
import torch.optim as optim
from tqdm import tqdm

from model import vgg


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))

    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
        "val": transforms.Compose([transforms.Resize((224, 224)),
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

    data_root = os.path.abspath(os.path.join(os.getcwd(), "../.."))  # get data root path
    image_path = os.path.join(data_root, "data_set", "flower_data")  # flower data set path
    assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
    train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
                                         transform=data_transform["train"])
    train_num = len(train_dataset)

    # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
    flower_list = train_dataset.class_to_idx
    cla_dict = dict((val, key) for key, val in flower_list.items())
    # write dict into json file
    json_str = json.dumps(cla_dict, indent=4)
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)

    batch_size =32
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print('Using {} dataloader workers every process'.format(nw))

    train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=batch_size, shuffle=True,
                                               num_workers=0)

    validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
                                            transform=data_transform["val"])
    val_num = len(validate_dataset)
    validate_loader = torch.utils.data.DataLoader(validate_dataset,
                                                  batch_size=batch_size, shuffle=False,
                                                  num_workers=0)
    print("using {} images for training, {} images for validation.".format(train_num,
                                                                           val_num))

    # test_data_iter = iter(validate_loader)
    # test_image, test_label = test_data_iter.next()

    model_name = "vgg16"
    net = vgg(model_name=model_name, num_classes=5, init_weights=True)
    net.to(device)
    loss_function = nn.CrossEntropyLoss()
    optimizer = optim.Adam(net.parameters(), lr=0.0001)

    epochs = 30
    best_acc = 0.0
    save_path = './{}Net.pth'.format(model_name)
    train_steps = len(train_loader)
    for epoch in range(epochs):
        # train
        net.train()
        running_loss = 0.0
        train_bar = tqdm(train_loader)
        for step, data in enumerate(train_bar):
            images, labels = data
            optimizer.zero_grad()
            outputs = net(images.to(device))
            loss = loss_function(outputs, labels.to(device))
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()

            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
                                                                     epochs,
                                                                     loss)

        # validate
        net.eval()
        acc = 0.0  # accumulate accurate number / epoch
        with torch.no_grad():
            val_bar = tqdm(validate_loader)
            for val_data in val_bar:
                val_images, val_labels = val_data
                outputs = net(val_images.to(device))
                predict_y = torch.max(outputs, dim=1)[1]
                acc += torch.eq(predict_y, val_labels.to(device)).sum().item()

        val_accurate = acc / val_num
        print('[epoch %d] train_loss: %.3f  val_accuracy: %.3f' %
              (epoch + 1, running_loss / train_steps, val_accurate))

        if val_accurate > best_acc:
            best_acc = val_accurate
            torch.save(net.state_dict(), save_path)

    print('Finished Training')


if __name__ == '__main__':
    main()

3. predict.py

Pytorch – alexnet – training flower classification data set_ heart_ 6662 blog – blogSame as before

import os
import json

import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

from model import vgg


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    data_transform = transforms.Compose(
        [transforms.Resize((224, 224)),
         transforms.ToTensor(),
         transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    # load image
    img_path = "../tulip.jpg"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path)
    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    # read class_indict
    json_path = './class_indices.json'
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)

    json_file = open(json_path, "r")
    class_indict = json.load(json_file)
    
    # create model
    model = vgg(model_name="vgg16", num_classes=5).to(device)
    # load model weights
    weights_path = "./vgg16Net.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    model.load_state_dict(torch.load(weights_path, map_location=device))

    model.eval()
    with torch.no_grad():
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()

    print_res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    for i in range(len(predict)):
        print("class: {:10}   prob: {:.3}".format(class_indict[str(i)],
                                                  predict[i].numpy()))
    plt.show()


if __name__ == '__main__':
    main()

be careful

The VGg network model has a deep depth and needs to use a powerful GPU for training (in addition, for a GPU with a larger memory, my 3050 can’t run, and pytorch will report an error that the GPU memory is insufficient)

You can also try making the batch smaller_ size

Pytorch - VGg network construction