In depth learning, RCNN is used to identify the verification code of indefinite length calculation questions (pytorch version, LSTM + ctcloss)

Time:2022-4-27

Here, after using transfer learning RESNET to train calculation problems in the previous article, it is difficult to solve the problem of verification code with multiple or indefinite length digits. Therefore, RNN is used to solve the verification code problem of calculation problems with indefinite length sequence. This solution is also applicable to verification codes with indefinite length of 4 and 6 digits. See the detailed steps next

Prepare the data set. My data set is like this. The total number of data sets is nearly tens of thousands. There are about three or four types. All the features are [“zero”, “one”, “two”, “three”, “four”, “Five”, “six”, “seven”, “eight”, “Nine”, “add”, “subtract”, “multiply”, “divide”, “wait”, “Yu”, “and”? “, “Yi”], I don’t have two digit or more calculation questions here. If there are some, the characteristics are actually the same. Why do I use RCNN even if there are no two digits here? It’s mainly to be compatible with a kind of data set, that is, multiplication, multiplication, Division and division. The Chinese of this symbol is of variable length and there are not many data sets. Of course, if you need a data set, you can chat with bloggers privately. Every section of the article, I will post all the codes. The codes are very detailed and clear.
In depth learning, RCNN is used to identify the verification code of indefinite length calculation questions (pytorch version, LSTM + ctcloss)

1. Build data loader

The data loader built here is similar to the previous article. The data processing in it is casual, mainly the data format and label format of the final output. Here, the last 6 characters are used as the label. For example, if one plus one equals, a prefix label is added in front of the label, If one times one equals, there is no need to add a prefix label. If you know the RNN network model, there is no need to talk about it. Of course, the final data format is the tensor of the image.

class NumberDataset(Dataset):
    def __init__(self, path: str, transform=None, ):
        """
        If you want to encapsulate a train = true / false, just do it
        : param path: dataset path
        :param transform:
        """
        super(NumberDataset, self).__init__()

        if not transform:
            transform = transforms.Compose([transforms.ToTensor(), ])
        self.transform = transform
        self.path = path
        self.picture_list = list(os.walk(self.path))[0][-1]
        self.label_map = [i for i in "_" + "".join(calc_list)]

    def __len__(self):
        return len(self.picture_list)

    def __getitem__(self, item):
        """
        :param item: ID
        : Return: (picture, label)
        """
        picture_path_list = self._load_picture()
        img = Image.open(picture_path_list[item])
        img = self.transform(img)
        label = self.picture_list[item].split('_')[0]
        #Set the maximum length, and fill in the insufficient position later, It is temporarily determined to be 7, and it shall be sealed in the later stage
        for i in range(6 - len(label)):
            label += '_'
        label = [self.label_map.index(i) for i in label]
        label = torch.as_tensor(label, dtype=torch.int64)

        # padding = torch.LongTensor([0] * (4 -

        return img, label, len(label)

    def _load_picture(self):
        return [self.path + '/' + i for i in self.picture_list]

2. Build training model

Here, resnet18 is used as the feature extraction network, the magic is changed to resnet18, the final FC full connection layer is changed to LSTM, and the bidirectional parameter is added. It is so simple.

class RestNetBasicBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride):
        super(RestNetBasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)

    def forward(self, x):
        output = self.conv1(x)
        output = F.relu(self.bn1(output))
        output = self.conv2(output)
        output = self.bn2(output)
        return F.relu(x + output)


class RestNetDownBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride):
        super(RestNetDownBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride[0], padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride[1], padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.extra = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride[0], padding=0),
            nn.BatchNorm2d(out_channels)
        )

    def forward(self, x):
        extra_x = self.extra(x)
        output = self.conv1(x)
        out = F.relu(self.bn1(output))

        out = self.conv2(out)
        out = self.bn2(out)
        return F.relu(extra_x + out)


class resnet18(nn.Module):
    def __init__(self):
        super(resnet18, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.bn1 = nn.BatchNorm2d(64)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = nn.Sequential(RestNetBasicBlock(64, 64, 1),
                                    RestNetBasicBlock(64, 64, 1))

        self.layer2 = nn.Sequential(RestNetDownBlock(64, 128, [2, 1]),
                                    RestNetBasicBlock(128, 128, 1))

        self.layer3 = nn.Sequential(RestNetDownBlock(128, 256, [2, 1]),
                                    RestNetBasicBlock(256, 256, 1))

        self.layer4 = nn.Sequential(RestNetDownBlock(256, 512, [2, 1]),
                                    RestNetBasicBlock(512, 512, 1))

    def forward(self, x):
        out = self.conv1(x)
        out = self.layer1(out)
        out = self.layer2(out)
        # out = self.layer3(out)
        # out = self.layer4(out)
        return out


class LstmNet(nn.Module):
    def __init__(self, image_shape, label_map_length):
        super(LstmNet, self).__init__()
        # resnet18
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.bn1 = nn.BatchNorm2d(64)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = nn.Sequential(RestNetBasicBlock(64, 64, 1),
                                    RestNetBasicBlock(64, 64, 1))
        self.layer2 = nn.Sequential(RestNetDownBlock(64, 128, [2, 1]),
                                    RestNetBasicBlock(128, 128, 1))
        self.layer3 = nn.Sequential(RestNetDownBlock(128, 256, [2, 1]),
                                    RestNetBasicBlock(256, 256, 1))
        self.layer4 = nn.Sequential(RestNetDownBlock(256, 512, [2, 1]),
                                    RestNetBasicBlock(512, 512, 1))
        #Calculate shape
        x = torch.zeros((1, 3) + image_shape)  # [1, 3, 64, 160]
        shape = resnet18()(x).shape  # [1, 256, 4, 10] BATCH, DIM, HEIGHT, WIDTH
        # print(shape)
        bone_output_shape = shape[1] * shape[2]
        self.lstm = nn.LSTM(bone_output_shape, bone_output_shape, num_layers=1, bidirectional=True)
        self.fc = nn.Linear(bone_output_shape * 2, label_map_length)

    def forward(self, x):
        x = self.conv1(x)
        x = self.layer1(x)
        x = self.layer2(x)
        # x = self.layer3(x)
        # x = self.layer4(x)  # [20, 512, 50, 150]
        # print(x.shape)
        x = x.permute(3, 0, 1, 2)  # [10, 1, 256, 4] [150, 20, 512, 50]
        # print(x.shape)
        w, b, c, h = x.shape
        x = x.view(w, b, c * h)  # [10, 1, 256 * 4] time_step batch_size input
        # print(x.shape)
        x, _ = self.lstm(x)
        time_step, batch_size, hidden = x.shape  # [10, 1, 2048]  time_step batch_size hidden
        x = x.view(time_step * batch_size, hidden)
        x = self.fc(x)  # [time_step * batch_size, label_map_length]
        return x.view(time_step, batch_size, -1)  # [time_step, batch_size, label_map_length] [10, 1, 37]

3. Start training

Since the title says that ctcloss should be used, of course, the loss function should be ctcloss, and the optimizer should be Adam. I didn’t add the optimizer learning strategy here, because after training to a certain extent, the accuracy of my model is also very high. The training step is normal torch training, and there is no difference in others. In fact, so far, the simplest difference from fixed length model recognition is that LSTM is used in the last layer, and then the loss function is replaced by ctcloss, There is no difference between others, so generally speaking, this is also relatively simple. If there are some function meanings in training or model building that I don’t understand, I won’t introduce them here. You can baidu or see my previous articles.

mapping = "_" + "".join(calc_list)
device = torch.device('cuda:1')
model = LstmNet((100, 300), len(mapping)).to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-4)
loss_func = nn.CTCLoss()
if os.path.exists('./models/model_rcnn.pkl'):
    model.load_state_dict(torch.load("./models/model_rcnn.pkl"))
    optimizer.load_state_dict(torch.load("./models/optimizer_rcnn.pkl"))

transform = transforms.Compose(
    [
        transforms.Resize((100, 300), ),
        transforms. Totensor (), # variable tensor
        transforms. Normalize ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) # standardize
    ]
)

for epoch in range(30):
    train_data = NumberDataset('./datasets_rcnn', transform=transform)
    train_dataloader = DataLoader(train_data, batch_size=32, shuffle=True, drop_last=True)
    Bar = tqdm (enumerate (train_dataloader), total = len (train_dataloader)) # enumerate data plus an index is packaged into an index and returned
    total_loss = []
    model.train()
    for idx, (input, label, _) in bar:
        #The gradient is set to 0 and the optimizer is empty
        optimizer.zero_grad()
        #Calculate predicted value
        input = input.to(device)

        label = label.to(device)
        output = model(input)
        # for i in range(output.shape[1]):
        #     output = output[:, i, :]  # [10, 37]
        #
        #     output = output.max(dim=0)  # [10]
        #     # output = output.contiguous()
        #     print(output[-1])
        #     exit()
        predict_lengths = torch.IntTensor([int(output.shape[0])] * label.shape[0])
        #Obtain the result of cross entropy loss
        loss = loss_func(output, label, predict_lengths, _)
        #Back propagation
        loss.backward()
        total_loss.append(loss.item())
        #Optimizer parameter update
        optimizer.step()
        #Print data

        # if idx % 50 == 0:
        bar.set_description("epcoh:{} idx:{},loss:{:.6f}".format(epoch, idx, np.mean(total_loss)))
        if idx % 200 == 0:
            torch. save(model.state_dict(), './ models/model_ rcnn. pkl', _ use_ new_ zipfile_ Serialization = true) # model save
            torch. save(optimizer.state_dict(), './ models/optimizer_ rcnn. pkl', _ use_ new_ zipfile_ Serialization = true) # optimizer save
    torch. save(model.state_dict(), './ models/model_ rcnn. pkl', _ use_ new_ zipfile_ Serialization = true) # model save
    torch. save(optimizer.state_dict(), './ models/optimizer_ rcnn. pkl', _ use_ new_ zipfile_ Serialization = true) # optimizer save

4. View accuracy

In depth learning, RCNN is used to identify the verification code of indefinite length calculation questions (pytorch version, LSTM + ctcloss)
Finally, I can see that I can output accurate answers to each calculation problem, and the generalization ability of the model is still very strong, that’s all.