Using raspberry pie 4B to construct deep learning application (12) mask

Time:2020-11-21

preface

In the last article, we solved both the environmental and network problems. In this article, we try to use AI to make an interesting automatic mask application while covid-19 is still rampant all over the world. Opencv + CNN is mainly used to extract the coordinates of key points on the face, and then the mask image is pasted as a mask. Remember to go out on the national day and the Mid Autumn Festival, and don’t forget to wear a mask to protect yourself.

Using raspberry pie 4B to construct deep learning application (12) mask

There are three stages in the whole process

  • Find a face on the image
  • Detect key points on the face
  • Cover the nose and mouth with a mask image

Face detection

Using raspberry pie 4B to construct deep learning application (12) mask

First of all, we need to locate the position of the face on an image, and the DNN module in opencv can do it easily. The detection model is trained in Caffe framework. We get a network definition file face_ detector.prototxt And a weight file face_ detector.caffemodel 。

# defining prototxt and caffemodel paths
#Face detection model
detector_model = args.detector_model
detector_weights = args.detector_weights

# load model
detector = cv2.dnn.readNetFromCaffe(detector_model, detector_weights)
capture = cv2.VideoCapture(0)

while True:
    # capture frame-by-frame
    success, frame = capture.read()

    # get frame's height and width
    height, width = frame.shape[:2]  # 640×480

    # resize and subtract BGR mean values, since Caffe uses BGR images for input
    blob = cv2.dnn.blobFromImage(
        frame, scalefactor=1.0, size=(300, 300), mean=(104.0, 177.0, 123.0),
    )
    # passing blob through the network to detect a face
    detector.setInput(blob)
    # detector output format:
    # [image_id, class, confidence, left, bottom, right, top]
    face_detections = detector.forward()

After reasoning, we can get the face number, type (face), confidence, left, bottom, right, and upper coordinates.

# loop over the detections
for i in range(0, face_detections.shape[2]):
    # extract confidence
    confidence = face_detections[0, 0, i, 2]

    # filter detections by confidence greater than the minimum threshold
    if confidence > 0.5:
        # get coordinates of the bounding box
        box = face_detections[0, 0, i, 3:7] * np.array(
            [width, height, width, height],
        )

We pick out all the detected faces, filter out the targets whose confidence level is less than 50%, and get the coordinates of the target frame of the face.

Tip:

The input of face detection model is (300300). When converting images, it should be noted that the input format of Caffe is BGR mode.

Get face key points

First, we have obtained the target frames of all faces. Next, as the input of the face key point detection model, we can obtain the key point positions such as eyes, eyebrows, nose, mouth, chin and face contour.

Using raspberry pie 4B to construct deep learning application (12) mask

1. Model overview

In this case, the high resolution of hrnet is not only the high resolution, but also the low resolution. China University of science and technology and Microsoft Asia Research Institute released a new human posture estimation model, which broke three coco records in that year, and won the CVPR 2019.

Using raspberry pie 4B to construct deep learning application (12) mask

It starts from a high-resolution sub network, and slowly adds a sub network of high-resolution to low-resolution. In particular, it does not rely on a single, low to high-level upsampling step to roughly aggregate the low-level and high-level representations together; instead, it continuously fuses the representations of different scales in the whole process.

The team used exchange units to shuttle between different sub networks: each sub network could obtain information from the representations produced by other sub networks. In this way, we can get rich high-resolution characterization.

Using raspberry pie 4B to construct deep learning application (12) mask

For more details, please refer to the open source project

https://github.com/HRNet/HRNet-Facial-Landmark-Detection

2. Cut out the face picture

Pay attention to the way of adjusting the face size here. Because the output target frame of the face detection model may be too close to the face, we can not directly use the precise coordinates of the detection frame to crop the image. Here, the face is magnified by 1.5 times, and then the 256 * 256 image is cut out relative to the central position.

(x1, y1, x2, y2) = box.astype("int")

# crop to detection and resize
resized = crop(
    frame,
    torch.Tensor([x1 + (x2 - x1) / 2, y1 + (y2 - y1) / 2]),
    1.5,
    tuple(input_size),
)

3. Preprocessing hrnet input image

Transform the image format and normalize the preprocessing

# convert from BGR to RGB since HRNet expects RGB format
resized = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)
img = resized.astype(np.float32) / 255.0
# normalize landmark net input
normalized_img = (img - mean) / std

Tip:

Note that the input image of hrnet is in RGB format.

4. Model construction

# init landmark model
#Face key point model
model = models.get_face_alignment_net(config)

# get input size from the config
input_size = config.MODEL.IMAGE_SIZE

# load model
state_dict = torch.load(args.landmark_model, map_location=device)

# remove `module.` prefix from the pre-trained weights
new_state_dict = OrderedDict()
for key, value in state_dict.items():
    name = key[7:]
    new_state_dict[name] = value

# load weights without the prefix
model.load_state_dict(new_state_dict)
# run model on device
model = model.to(device)

5. Model reasoning

Input the preprocessed images into the hrnet network, get 68 face tag data, and then call decode_ Preds function is used to restore the previous clipping and scaling of the image to get the coordinates of key points based on the original image.

# predict face landmarks
model = model.eval()
with torch.no_grad():
    input = torch.Tensor(normalized_img.transpose([2, 0, 1]))
    input = input.to(device)
    output = model(input.unsqueeze(0))
    score_map = output.data.cpu()
    preds = decode_preds(
        score_map,
        [torch.Tensor([x1 + (x2 - x1) / 2, y1 + (y2 - y1) / 2])],
        [1.5],
        score_map.shape[2:4],
    )

Mask picture binding

Now we have the key point information of the face. Since the mask is usually covered from the lower part of the nose to the upper part of the chin, we choose the coordinates from 2 to 16.

1. Label the mask picture

In order to better align the picture, we need to label the mask image. Here you can use makense, an open source online annotation tool. Easy to use.

https://www.makesense.ai/

Using raspberry pie 4B to construct deep learning application (12) mask

Finally, save the annotation in the format of CSV.

2. Read the coordinates of key points

The key points 2-16 and 30 are selected here, and the starting coordinates of the key points are 0.

# get chosen landmarks 2-16, 30 as destination points
# note that landmarks numbering starts from 0
dst_pts = np.array(
    [
        landmarks[1], 
        landmarks[2],
        landmarks[3],
        landmarks[4],
        landmarks[5],
        landmarks[6],
        landmarks[7],
        landmarks[8],
        landmarks[9],
        landmarks[10],
        landmarks[11],
        landmarks[12],
        landmarks[13],
        landmarks[14],
        landmarks[15],
        landmarks[29],
    ],
    dtype="float32",
)

# load mask annotations from csv file to source points
mask_annotation = os.path.splitext(os.path.basename(args.mask_image))[0]
mask_annotation = os.path.join(
    os.path.dirname(args.mask_image), mask_annotation + ".csv",
)

with open(mask_annotation) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=",")
    src_pts = []
    for i, row in enumerate(csv_reader):
        # skip head or empty line if it's there
        try:
            src_pts.append(np.array([float(row[1]), float(row[2])]))
        except ValueError:
            continue
src_pts = np.array(src_pts, dtype="float32")

3. Bind key point coordinates

dst_ PTS is the detected face coordinates; Src_ PTS is the label coordinates of mask picture.

# overlay with a mask only if all landmarks have positive coordinates:
if (landmarks > 0).all():
    # load mask image
    mask_img = cv2.imread(args.mask_image, cv2.IMREAD_UNCHANGED)
    mask_img = mask_img.astype(np.float32)
    mask_img = mask_img / 255.0

    # get the perspective transformation matrix
    M, _ = cv2.findHomography(src_pts, dst_pts)

    # transformed masked image
    transformed_mask = cv2.warpPerspective(
        mask_img,
        M,
        (result.shape[1], result.shape[0]),
        None,
        cv2.INTER_LINEAR,
        cv2.BORDER_CONSTANT,
    )

    # mask overlay
    alpha_mask = transformed_mask[:, :, 3]
    alpha_image = 1.0 - alpha_mask

    for c in range(0, 3):
        result[:, :, c] = (
                alpha_mask * transformed_mask[:, :, c]
                + alpha_image * result[:, :, c]
        )

# display the resulting frame
cv2.imshow("image with mask overlay", result)

In this paper, the findhomography function in opencv library is used to find the transformation between matching key points, and then the transformation matrix found is applied together with the warppopective function to map these points.

Get a new image transformed with the same size as the original image result_ Mask, using PNG format to define the transparent channel alpha_ Mask, you can merge the two images together.

4. Installation dependency

First install the YACs dependency to read the network structure definition file.

pip install yacs

Using raspberry pie 4B to construct deep learning application (12) mask

5. Run the model on the raspberry pie

python overlay_with_mask.py 
  --cfg experiments/300w/face_alignment_300w_hrnet_w18.yaml 
  --landmark_model HR18-300W.pth 
  --mask_image masks/anti_covid.png 
  --device cpu

The parameters are as follows:

CFG: hrnet network configuration file

Landmark: hrnet weight file

mask_ Image: Mask picture

Device: inference device

Using raspberry pie 4B to construct deep learning application (12) mask

Using raspberry pie 4B to construct deep learning application (12) mask

Using raspberry pie 4B to construct deep learning application (12) mask

The speed is relatively slow, but it can still make normal reasoning, which shows that our previous Python environment has no problem. Raspberry pie is full load. It’s going to smoke..

Using raspberry pie 4B to construct deep learning application (12) mask

Tip:

There is a bug in openvino, which is mainly caused by the portability of 32-bit OS. Replace nGraph’s i64 to i32 for size_ t. Modifying the source code of ngraph and recompiling it can solve the problem, or switch to the version of OpenCV 4.4 that we compiled ourselves.

Using raspberry pie 4B to construct deep learning application (12) mask

https://github.com/openvinotoolkit/openvino/issues/1503

6. Run the model with GPU

python overlay_with_mask.py 
  --cfg experiments/300w/face_alignment_300w_hrnet_w18.yaml 
  --landmark_model HR18-300W.pth 
  --mask_image masks/anti_covid.png 
  --device cuda

Using raspberry pie 4B to construct deep learning application (12) mask

Reasoning speed a little bit faster, notebook graphics card GTX 1060, basically can play. In addition, there are many ways to optimize the operation speed of the mask, such as cutting and so on.

Source code download

Using raspberry pie 4B to construct deep learning application (12) mask


Next Preview

This one is covered by a mask,
The next one,
We’re going to use AI as well,
The brain replenishes your beauty of prosperity,
Coming soon…

Using raspberry pie 4B to construct deep learning application (12) mask