catalogue
VGg introduction
VGg was established by the famous research group of Oxford University in 2014VGG(Visual Geometry Group)Put forward,
(address of thesis:https://arxiv.org/abs/1409.1556)
In the Imagenet competition of the yearLocalization taskFirst and Classification taskproxime accessit. (it can be said to be very powerful)
What’s the power of VGg?
By stacking multiple small convolution kernels instead of large-scale convolution kernels, the training parameters can be reduced and the same receptive field can be guaranteed
What is receptive field?
Determining the area size of the input layer corresponding to an element in the output result of a layer is calledReceptive field(receptive field)
(Simply put, a cell on the output feature map corresponds to the area size on the input layer (upper layer))
For example, in the above figure, the receptive field of maxpool 1 is 2 (meaning that one grid on the upper layer corresponds to two grids on the lower layer)
Conv1 receptive field was 5
Calculation formula
The calculation formula of our receptive field is:
![]()
F (I + 1) is the receptive field of layer I + 1
Stride is the step length of the i-th layer
Ksize is the convolution kernel or pooled kernel size
Question 1:
Stack two 3 × The convolution kernel of 3 replaces the convolution kernel of 5×5 and stacks three 3 × 3 instead of 7×7.
(In VGg network, the convolution stride defaults to 1)
Are the receptive fields the same before and after substitution?
According to the formula
(first floor)Feature map: F(1) = 1
(second floor)Conv3x3(3):
(third floor)Conv3x3(2):
(5 × 5 convolution nucleus receptive field)
(fourth floor)Conv3x3(1):
(7 × 7 convolution nucleus (receptive field)
2 3 × threeConvolution kernel and a5×5Convolution kernelSame receptive field
Prove that it can passStack two 3 × The convolution kernel of 3 replaces the convolution kernel of 5×5 and stacks three 3 × The convolution kernel of 3 replaces the convolution kernel of 7×7
Question 2:
Stack 3 × 3 is the training parameter really reduced after convolution kernel?
Note: number of CNN parameters = convolution kernel size × Convolution kernel depth × Number of convolution kernel groups = convolution kernel size × Input characteristic matrix depth × Output characteristic matrix depth
It is assumed that the depth of the input characteristic matrix = the depth of the output characteristic matrix = C
Use 7 × 7 number of parameters required for convolution kernel:
Stack three 3 × Number of parameters required for convolution kernel of 3:
Obviously 27 is less than 49
network diagram
VGg network has multiple versions,
We generally use vgg16(16 means 16 layers = 12 convolution layers + 4 full connection layers)
The network structure is as follows:
Looking at the picture and calculation, we can know that after 3 × 3 the size of the characteristic matrix of convolution does not change:
out =(in −F+2P)/S+1=(in −3+2)/1+1= in
Out = same size as in
Pytorch builds VGg network
VGg network is divided intoConvolution layer feature extractionAndThe whole connection layer is classifiedThese two modules
🚓1. model.py
import torch.nn as nn
import torch
class VGG(nn.Module):
def __init__(self, features, num_classes=1000, init_weights=False):
super(VGG, self).__init__()
self. features = features # Convolution layer feature extraction
self. classifier = nn. Sequential( # The whole connection layer is classified
nn.Dropout(p=0.5),
nn.Linear(512*7*7, 2048),
nn.ReLU(True),
nn.Dropout(p=0.5),
nn.Linear(2048, 2048),
nn.ReLU(True),
nn.Linear(2048, num_classes)
)
if init_weights:
self._ initialize_ Weights() # initialize weights
def forward(self, x):
# N x 3 x 224 x 224
x = self.features(x)
# N x 512 x 7 x 7
x = torch.flatten(x, start_dim=1)
# N x 512*7*7
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
# nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
nn.init.xavier_uniform_(m.weight)
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.xavier_uniform_(m.weight)
# nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
Magical place to deal with
#VGg network model configuration list. The number represents the number of convolution cores,'m 'represents the maximum pooling layer
cfgs = {
'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], # Model a
'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], # Model B
'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'], # Model d
'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'], # Model e
}
#Convolution layer feature extraction
def make_ Features (CFG: list): # the parameter list of a specific model is passed in
layers = []
in_ channels = 3 # Input original image (RGB three channel)
for v in cfg:
#If it is a maximum pool layer, pool it
if v == "M":
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
#Otherwise, it is the convolution layer
else:
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
layers += [conv2d, nn.ReLU(True)]
in_channels = v
return nn. Sequential (* layers) # single asterisk (*) imports parameters as tuples
Def VGg (model_name = "vgg16", * * kwargs): # double asterisk (* *) import parameters as a dictionary
try:
cfg = cfgs[model_name]
except:
print("Warning: model number {} not in cfgs dict!".format(model_name))
exit(-1)
Model = VGg (make_features (CFG), * * kwargs) # * * kwargs is the dictionary data you passed in
return model
🚓2. train.py
andPytorch – alexnet – training flower classification data set_ heart_ 6662 blog – blogSame as (data or flower data)
import os
import json
import torch
import torch.nn as nn
from torchvision import transforms, datasets
import torch.optim as optim
from tqdm import tqdm
from model import vgg
def main():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("using {} device.".format(device))
data_transform = {
"train": transforms.Compose([transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
"val": transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}
data_root = os.path.abspath(os.path.join(os.getcwd(), "../..")) # get data root path
image_path = os.path.join(data_root, "data_set", "flower_data") # flower data set path
assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
transform=data_transform["train"])
train_num = len(train_dataset)
# {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
flower_list = train_dataset.class_to_idx
cla_dict = dict((val, key) for key, val in flower_list.items())
# write dict into json file
json_str = json.dumps(cla_dict, indent=4)
with open('class_indices.json', 'w') as json_file:
json_file.write(json_str)
batch_size =32
nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
print('Using {} dataloader workers every process'.format(nw))
train_loader = torch.utils.data.DataLoader(train_dataset,
batch_size=batch_size, shuffle=True,
num_workers=0)
validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
transform=data_transform["val"])
val_num = len(validate_dataset)
validate_loader = torch.utils.data.DataLoader(validate_dataset,
batch_size=batch_size, shuffle=False,
num_workers=0)
print("using {} images for training, {} images for validation.".format(train_num,
val_num))
# test_data_iter = iter(validate_loader)
# test_image, test_label = test_data_iter.next()
model_name = "vgg16"
net = vgg(model_name=model_name, num_classes=5, init_weights=True)
net.to(device)
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.0001)
epochs = 30
best_acc = 0.0
save_path = './{}Net.pth'.format(model_name)
train_steps = len(train_loader)
for epoch in range(epochs):
# train
net.train()
running_loss = 0.0
train_bar = tqdm(train_loader)
for step, data in enumerate(train_bar):
images, labels = data
optimizer.zero_grad()
outputs = net(images.to(device))
loss = loss_function(outputs, labels.to(device))
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
epochs,
loss)
# validate
net.eval()
acc = 0.0 # accumulate accurate number / epoch
with torch.no_grad():
val_bar = tqdm(validate_loader)
for val_data in val_bar:
val_images, val_labels = val_data
outputs = net(val_images.to(device))
predict_y = torch.max(outputs, dim=1)[1]
acc += torch.eq(predict_y, val_labels.to(device)).sum().item()
val_accurate = acc / val_num
print('[epoch %d] train_loss: %.3f val_accuracy: %.3f' %
(epoch + 1, running_loss / train_steps, val_accurate))
if val_accurate > best_acc:
best_acc = val_accurate
torch.save(net.state_dict(), save_path)
print('Finished Training')
if __name__ == '__main__':
main()
3. predict.py
Pytorch – alexnet – training flower classification data set_ heart_ 6662 blog – blogSame as before
import os
import json
import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
from model import vgg
def main():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
data_transform = transforms.Compose(
[transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# load image
img_path = "../tulip.jpg"
assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
img = Image.open(img_path)
plt.imshow(img)
# [N, C, H, W]
img = data_transform(img)
# expand batch dimension
img = torch.unsqueeze(img, dim=0)
# read class_indict
json_path = './class_indices.json'
assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)
json_file = open(json_path, "r")
class_indict = json.load(json_file)
# create model
model = vgg(model_name="vgg16", num_classes=5).to(device)
# load model weights
weights_path = "./vgg16Net.pth"
assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
model.load_state_dict(torch.load(weights_path, map_location=device))
model.eval()
with torch.no_grad():
# predict class
output = torch.squeeze(model(img.to(device))).cpu()
predict = torch.softmax(output, dim=0)
predict_cla = torch.argmax(predict).numpy()
print_res = "class: {} prob: {:.3}".format(class_indict[str(predict_cla)],
predict[predict_cla].numpy())
plt.title(print_res)
for i in range(len(predict)):
print("class: {:10} prob: {:.3}".format(class_indict[str(i)],
predict[i].numpy()))
plt.show()
if __name__ == '__main__':
main()
be careful
The VGg network model has a deep depth and needs to use a powerful GPU for training (in addition, for a GPU with a larger memory, my 3050 can’t run, and pytorch will report an error that the GPU memory is insufficient)
You can also try making the batch smaller_ size