# In depth learning, RCNN is used to identify the verification code of indefinite length calculation questions (pytorch version, LSTM + ctcloss)

Time：2022-4-27

#### Here, after using transfer learning RESNET to train calculation problems in the previous article, it is difficult to solve the problem of verification code with multiple or indefinite length digits. Therefore, RNN is used to solve the verification code problem of calculation problems with indefinite length sequence. This solution is also applicable to verification codes with indefinite length of 4 and 6 digits. See the detailed steps next

Prepare the data set. My data set is like this. The total number of data sets is nearly tens of thousands. There are about three or four types. All the features are [“zero”, “one”, “two”, “three”, “four”, “Five”, “six”, “seven”, “eight”, “Nine”, “add”, “subtract”, “multiply”, “divide”, “wait”, “Yu”, “and”? “, “Yi”], I don’t have two digit or more calculation questions here. If there are some, the characteristics are actually the same. Why do I use RCNN even if there are no two digits here? It’s mainly to be compatible with a kind of data set, that is, multiplication, multiplication, Division and division. The Chinese of this symbol is of variable length and there are not many data sets. Of course, if you need a data set, you can chat with bloggers privately. Every section of the article, I will post all the codes. The codes are very detailed and clear. ### 1. Build data loader

The data loader built here is similar to the previous article. The data processing in it is casual, mainly the data format and label format of the final output. Here, the last 6 characters are used as the label. For example, if one plus one equals, a prefix label is added in front of the label, If one times one equals, there is no need to add a prefix label. If you know the RNN network model, there is no need to talk about it. Of course, the final data format is the tensor of the image.

``````class NumberDataset(Dataset):
def __init__(self, path: str, transform=None, ):
"""
If you want to encapsulate a train = true / false, just do it
: param path: dataset path
:param transform:
"""
super(NumberDataset, self).__init__()

if not transform:
transform = transforms.Compose([transforms.ToTensor(), ])
self.transform = transform
self.path = path
self.picture_list = list(os.walk(self.path))[-1]
self.label_map = [i for i in "_" + "".join(calc_list)]

def __len__(self):
return len(self.picture_list)

def __getitem__(self, item):
"""
:param item: ID
: Return: (picture, label)
"""
img = Image.open(picture_path_list[item])
img = self.transform(img)
label = self.picture_list[item].split('_')
#Set the maximum length, and fill in the insufficient position later, It is temporarily determined to be 7, and it shall be sealed in the later stage
for i in range(6 - len(label)):
label += '_'
label = [self.label_map.index(i) for i in label]
label = torch.as_tensor(label, dtype=torch.int64)

# padding = torch.LongTensor( * (4 -

return img, label, len(label)

return [self.path + '/' + i for i in self.picture_list]``````

### 2. Build training model

Here, resnet18 is used as the feature extraction network, the magic is changed to resnet18, the final FC full connection layer is changed to LSTM, and the bidirectional parameter is added. It is so simple.

``````class RestNetBasicBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride):
super(RestNetBasicBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1)
self.bn2 = nn.BatchNorm2d(out_channels)

def forward(self, x):
output = self.conv1(x)
output = F.relu(self.bn1(output))
output = self.conv2(output)
output = self.bn2(output)
return F.relu(x + output)

class RestNetDownBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride):
super(RestNetDownBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1)
self.bn2 = nn.BatchNorm2d(out_channels)
self.extra = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, padding=0),
nn.BatchNorm2d(out_channels)
)

def forward(self, x):
extra_x = self.extra(x)
output = self.conv1(x)
out = F.relu(self.bn1(output))

out = self.conv2(out)
out = self.bn2(out)
return F.relu(extra_x + out)

class resnet18(nn.Module):
def __init__(self):
super(resnet18, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
self.bn1 = nn.BatchNorm2d(64)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

self.layer1 = nn.Sequential(RestNetBasicBlock(64, 64, 1),
RestNetBasicBlock(64, 64, 1))

self.layer2 = nn.Sequential(RestNetDownBlock(64, 128, [2, 1]),
RestNetBasicBlock(128, 128, 1))

self.layer3 = nn.Sequential(RestNetDownBlock(128, 256, [2, 1]),
RestNetBasicBlock(256, 256, 1))

self.layer4 = nn.Sequential(RestNetDownBlock(256, 512, [2, 1]),
RestNetBasicBlock(512, 512, 1))

def forward(self, x):
out = self.conv1(x)
out = self.layer1(out)
out = self.layer2(out)
# out = self.layer3(out)
# out = self.layer4(out)
return out

class LstmNet(nn.Module):
def __init__(self, image_shape, label_map_length):
super(LstmNet, self).__init__()
# resnet18
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
self.bn1 = nn.BatchNorm2d(64)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = nn.Sequential(RestNetBasicBlock(64, 64, 1),
RestNetBasicBlock(64, 64, 1))
self.layer2 = nn.Sequential(RestNetDownBlock(64, 128, [2, 1]),
RestNetBasicBlock(128, 128, 1))
self.layer3 = nn.Sequential(RestNetDownBlock(128, 256, [2, 1]),
RestNetBasicBlock(256, 256, 1))
self.layer4 = nn.Sequential(RestNetDownBlock(256, 512, [2, 1]),
RestNetBasicBlock(512, 512, 1))
#Calculate shape
x = torch.zeros((1, 3) + image_shape)  # [1, 3, 64, 160]
shape = resnet18()(x).shape  # [1, 256, 4, 10] BATCH, DIM, HEIGHT, WIDTH
# print(shape)
bone_output_shape = shape * shape
self.lstm = nn.LSTM(bone_output_shape, bone_output_shape, num_layers=1, bidirectional=True)
self.fc = nn.Linear(bone_output_shape * 2, label_map_length)

def forward(self, x):
x = self.conv1(x)
x = self.layer1(x)
x = self.layer2(x)
# x = self.layer3(x)
# x = self.layer4(x)  # [20, 512, 50, 150]
# print(x.shape)
x = x.permute(3, 0, 1, 2)  # [10, 1, 256, 4] [150, 20, 512, 50]
# print(x.shape)
w, b, c, h = x.shape
x = x.view(w, b, c * h)  # [10, 1, 256 * 4] time_step batch_size input
# print(x.shape)
x, _ = self.lstm(x)
time_step, batch_size, hidden = x.shape  # [10, 1, 2048]  time_step batch_size hidden
x = x.view(time_step * batch_size, hidden)
x = self.fc(x)  # [time_step * batch_size, label_map_length]
return x.view(time_step, batch_size, -1)  # [time_step, batch_size, label_map_length] [10, 1, 37]``````

### 3. Start training

Since the title says that ctcloss should be used, of course, the loss function should be ctcloss, and the optimizer should be Adam. I didn’t add the optimizer learning strategy here, because after training to a certain extent, the accuracy of my model is also very high. The training step is normal torch training, and there is no difference in others. In fact, so far, the simplest difference from fixed length model recognition is that LSTM is used in the last layer, and then the loss function is replaced by ctcloss, There is no difference between others, so generally speaking, this is also relatively simple. If there are some function meanings in training or model building that I don’t understand, I won’t introduce them here. You can baidu or see my previous articles.

``````mapping = "_" + "".join(calc_list)
device = torch.device('cuda:1')
model = LstmNet((100, 300), len(mapping)).to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-4)
loss_func = nn.CTCLoss()
if os.path.exists('./models/model_rcnn.pkl'):

transform = transforms.Compose(
[
transforms.Resize((100, 300), ),
transforms. Totensor (), # variable tensor
transforms. Normalize ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) # standardize
]
)

for epoch in range(30):
train_data = NumberDataset('./datasets_rcnn', transform=transform)
Bar = tqdm (enumerate (train_dataloader), total = len (train_dataloader)) # enumerate data plus an index is packaged into an index and returned
total_loss = []
model.train()
for idx, (input, label, _) in bar:
#The gradient is set to 0 and the optimizer is empty
#Calculate predicted value
input = input.to(device)

label = label.to(device)
output = model(input)
# for i in range(output.shape):
#     output = output[:, i, :]  # [10, 37]
#
#     output = output.max(dim=0)  # 
#     # output = output.contiguous()
#     print(output[-1])
#     exit()
predict_lengths = torch.IntTensor([int(output.shape)] * label.shape)
#Obtain the result of cross entropy loss
loss = loss_func(output, label, predict_lengths, _)
#Back propagation
loss.backward()
total_loss.append(loss.item())
#Optimizer parameter update
optimizer.step()
#Print data

# if idx % 50 == 0:
bar.set_description("epcoh:{} idx:{},loss:{:.6f}".format(epoch, idx, np.mean(total_loss)))
if idx % 200 == 0:
torch. save(model.state_dict(), './ models/model_ rcnn. pkl', _ use_ new_ zipfile_ Serialization = true) # model save
torch. save(optimizer.state_dict(), './ models/optimizer_ rcnn. pkl', _ use_ new_ zipfile_ Serialization = true) # optimizer save
torch. save(model.state_dict(), './ models/model_ rcnn. pkl', _ use_ new_ zipfile_ Serialization = true) # model save
torch. save(optimizer.state_dict(), './ models/optimizer_ rcnn. pkl', _ use_ new_ zipfile_ Serialization = true) # optimizer save``````

### 4. View accuracy Finally, I can see that I can output accurate answers to each calculation problem, and the generalization ability of the model is still very strong, that’s all.

## VuePress window document is not defined when building and packaging

foreword Recently, I am building a vue component library (there are also many stories to tell…), after building the wheel, I need to write a component document. I chose vuepress, and I need a code demonstration in the document. Naturally, I need to introduce a component library and introduce components. After the library is written, […]