[technical blog] garbage classification based on alexnet network

Time:2020-11-13

[technical blog] garbage classification based on alexnet network

AlexNet

Alexnet model comes from the paper Imagenet classification with deep volatile neural networks, written by Alex krizhevsky, Ilya sutskever, Geoffrey e.hinton
In Imagenet lsvrc-2012, alexnet achieved the lowest top-5 error rate of 15.3%, 10.8 percentage points lower than the second place.

network structure

Alexnet consists of eight layers. The first five layers are convolution layer, and the last three layers are full connection layer. It uses relu activation function and shows better training performance than tabh and sigmoid.

[technical blog] garbage classification based on alexnet network
The graph in the paper is more abstract and not easy to analyze the structure. A more intuitive structure diagram is provided below.

[technical blog] garbage classification based on alexnet network

Reference link:Netscope

  • First layer (convolution layer)

Input data: 227 × 227 × 3
Convolution kernel: 11 × 11 × 3; step size: 4; number: 96
Convolution data: 55 × 55 × 96
Data after relu: 55 × 55 × 96
The core of max pool: 3 × 3, step size: 2
Data after max pool: 27 × 27 × 96
norm1:local_ Size = 5 (LRN (local response normalization)
The final output: 27 × 27 × 96

  • Second layer (convolution layer)

Input data: 27 × 27 × 96
Convolution kernel: 5 × 5; step size: 1; number: 256
Data after convolution: 27 × 27 × 256 (the same padding is done to make the image size unchanged after convolution.)
Data after relu2: 27 × 27 × 256
The core of max pool 2: 3 × 3, step size: 2
Data after Max pool2: 13 × 13 × 256 ((27-3) / 2 + 1 = 13)
norm2:local_ Size = 5 (LRN (local response normalization)
The final output: 13 × 13 × 256

  • Layer 3 (convolution layer)

Input data: 13 × 13 × 256
Convolution kernel: 3 × 3; step size: 1; number of outputs: 384
Data after convolution: 13 × 13 × 384 (the same padding is done to make the image size unchanged after convolution.)
Data after relu3: 13 × 13 × 384
The final output: 13 × 13 × 384
The third layer has no max pool layer and norm layer

  • Layer 4 (convolution layer)

Input data: 13 × 13 × 384
Convolution kernel: 3 × 3; step size: 1; number of outputs: 384
Data after convolution: 13 × 13 × 384 (the same padding is done to make the image size unchanged after convolution.)
Data after relu4: 13 × 13 × 384
The final output: 13 × 13 × 384
The fourth layer has no max pool layer and norm layer

  • Layer 5 (convolution layer)

Input data: 13 × 13 × 384
Convolution kernel: 3 × 3; step size: 1; number of outputs: 256
Data after convolution: 13 × 13 × 256 (the same padding is done to make the image size unchanged after convolution.)
Data after relu5: 13 × 13 × 256
The core of Max pool5: 3 × 3, step size: 2
Data after Max pool2: 6 × 6 × 256 ((13-3) / 2 + 1 = 6)
The final output: 6 × 6 × 256
The fifth layer has max pool, no norm layer

  • Layer 6 (full connection layer)

Input data: 6 × 6 × 256
Fully connected output: 4096 × 1
Data after relu6: 4096 × 1
Data after drop out6: 4096 × 1
The final output is 4096 × 1

  • Layer 7 (full connection layer)

Input data: 4096 × 1
Fully connected output: 4096 × 1
Data after relu7: 4096 × 1
Data after drop out7: 4096 × 1
The final output is 4096 × 1

  • Layer 8 (full connection layer)

Input data: 4096 × 1
Fully connected output: 1000
Fc8 outputs the probability of 1000 categories.

Data set preprocessing

A total of 2307 images were used in this experiment, which were divided into six categories: cardboard (370), glass (457), metal (380), paper (540), plastic (445), trash (115). The images in the data set are processed three channel pictures of 512×384.
Because the data set is small, it is necessary to expand the data set through data enhancement. In this experiment, the image was randomly flipped, and the 227×227 size subgraph was cropped to expand the data set. In order to improve the accuracy of the model, before inputting the model, the image needs to be normalized to map the value of each pixel to (0,1).
Define atorch.utils.data.DatasetClass, which is used to load datasets from the hard disk. Because of random clipping, theGarbageDatasetClass to increase the size of the dataset by 10 times.

class GarbageDataset(Dataset):

    classifications = ["cardboard", "glass", "metal", "paper", "plastic", "trash"]

    def __init__(self, root_dir, transform = None):
        super(GarbageDataset, self).__init__()
        self.root_dir = root_dir
        self.transform = transform
        self.imgs = []
        self.read()

    def __len__(self):
        return 10 * len(self.imgs)

    def __getitem__(self, item):
        img, label = self.imgs[item % len(self.imgs)]
        if self.transform:
            img = self.transform(img)
        return img, label

    def read(self):
        img_dir = os.path.join(self.root_dir, "garbage")
        for i, c in enumerate(GarbageDataset.classifications, 0):
            dir = os.path.join(img_dir, c)
            for img_name in os.listdir(dir):
                img = Image.open(os.path.join(dir, img_name))
                self.imgs.append((img, i))

definitiontransforms, instantiationGarbageDatasetLoad the data set, and divide the training set, verification set and test set according to the ratio of 6:2:2.

dataset = GarbageDataset("data", transform=transforms.Compose([
    transforms.Resize(227),
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(227),
    transforms.RandomRotation(90),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
]))
dataset_size = len(dataset)
validset_size = int(dataset_size / 5)
testset_size = validset_size
trainset_size = dataset_size - validset_size - testset_size

trainset, validset, testset = torch.utils.data.random_split(dataset, [trainset_size, validset_size, testset_size])

For training set, verification set and test set, instantiate oneDataLoader

#Training sets need to be out of order
trainloader = DataLoader(dataset=trainset, batch_size=128, shuffle=True)
#Verification set and test set can not disturb the data order
validloader = DataLoader(dataset=validset, batch_size=128, shuffle=False)
testloader = DataLoader(dataset=testset, batch_size=128, shuffle=False)

Model building

Defining models

class GarbageNet(nn.Module):

    def __init__(self):
        super(GarbageNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 96, 11, 4)
        self.conv2 = nn.Conv2d(96, 256, 5, 1, padding=2, groups=2)
        self.conv3 = nn.Conv2d(256, 384, 3, 1, padding=1)
        self.conv4 = nn.Conv2d(384, 384, 3, 1, padding=1)
        self.conv5 = nn.Conv2d(384, 256, 3, 1, padding=1)
        self.fc1 = nn.Linear(256 * 6 * 6, 4096)
        self.fc2 = nn.Linear(4096, 4096)
        self.fc3 = nn.Linear(4096, 6)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=3, stride=2)
        # x = F.max_pool2d(F.relu(self.conv1(x)), kernel_size=3, stride=2)
        x = F.max_pool2d(F.relu(self.conv2(x)), kernel_size=3, stride=2)
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv4(x))
        x = F.max_pool2d(self.conv5(x), kernel_size=3, stride=2)
        x = x.view(-1, 256 * 6 * 6)
        x = F.dropout(F.relu(self.fc1(x)))
        x = F.dropout(F.relu(self.fc2(x)))
        x = self.fc3(x)
        return x

In this garbage sorting task, the images are finally divided into six categories, so different from the original alexnet, the output size of the last full connection layer is 6.

According to the parameters in alexnet paper, the optimizer uses SGD, and sets its learning rate to 0.01, momentum attenuation parameter to 0.9, and weight attenuation parameter to 0.0005.
Loss function usageCrossEntropyLoss

optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0005)

criterion = nn.CrossEntropyLoss()

Define the training process

def train(dataloader):
    epoch_loss = 0.0
    iter_num = 0

    correct = 0
    total = 0

    for i, data in enumerate(dataloader, 0):
        inputs, labels = data
        if use_gpu:
            inputs = inputs.to(GPU)
            labels = labels.to(GPU)

        if torch.is_grad_enabled():
            optimizer.zero_grad()

        outputs = net(inputs)
        
        loss = criterion(outputs, labels)

        if torch.is_grad_enabled():
            loss.backward()

            optimizer.step()

        epoch_loss += loss.item()
        iter_num += 1

        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i, lb in enumerate(labels):
            correct += c[i].item()
            total += 1

    return epoch_loss / iter_num, correct / total

Training model

for epoch in range(0, EPOCH_NUMBER):
    t_l, t_a = train(trainloader)
    train_loss.append(t_l)
    train_accuracy.append(t_a)

    with torch.no_grad():
        v_l, v_a = train(validloader)
        
    print("Epoch %03d train loss: %.6f" % (epoch + 1, t_l))
    print("        val accuracy: %.2f%%" % (100 * v_a))
        
    val_loss.append(v_l)
    val_accuracy.append(v_a)

Visual training results

plt.figure(figsize=(15, 5))
plt.subplot(121)
plt.plot(range(EPOCH_NUMBER), train_accuracy, label="train")
plt.plot(range(EPOCH_NUMBER), val_accuracy, label='val')
plt.title("Accuracy", size=15)
plt.legend()
plt.grid(True)
plt.subplot(122)
plt.plot(range(EPOCH_NUMBER), train_loss, label="train")
plt.plot(range(EPOCH_NUMBER), val_loss, label="val")
plt.title("Loss", size=15)
plt.legend()
plt.grid(True)
plt.show()

[technical blog] garbage classification based on alexnet network
It can be seen from the figure that with the increase of iteration times, the accuracy rate gradually increases. When the number of iterations exceeds 75, it tends to be stable. The accuracy on the verification set can reach more than 95%, which is very small compared with the training set, which shows that the classification effect is good and the generalization ability of the model is good.

MO Project Link