[discussion] discussion on the problem of priority inversion caused by the use of mutex in v4.0.4

Time:2021-12-2

This article was originally published by RT thread forum user @ Jay:https://club.rt-thread.org/as…
Discussion on the problem of priority inversion caused by the use of mutex in RT thread v4.0.4

Last night (October 20, 2021), RTT organized an online conference to show some new features and fixed problems of v4.0.4. Among them, @ man Jianting talked about the problem of thread priority reversal caused by the use of mutex in his speech, which is very interesting.

1、 Brief introduction to mutex

Mutex is a way of synchronization between threads, also known as mutually exclusive semaphore. It is a special binary semaphore. Mutual exclusion is similar to a parking lot with only one parking space: when a car enters, lock the door of the parking lot and other vehicles wait outside. When the car inside comes out, open the gate of the parking lot and the next car can enter. (quoted from RTT documents)
2、 What problem does mutex solve
2.1 thread priority reversal

Suppose there are three threads, a, B and C, whose priority relationship is a > b > C and a common memory space M. In order to ensure the security of data in memory space, no more than one thread can operate in the same time period. That is, when C is reading the data of M, a or B cannot modify M.

Due to such provisions, priority reversal will be caused:

61fa16d75610ca418c881a704571fa29.jpg.webp

C is ready and takes control of M
A is ready, the priority is higher than C, and the CPU processes a first
A attempts to obtain the control of M, because C is holding the control of M, so it hangs waiting; C continue reading M
B is ready, the priority is higher than C, and the CPU gives priority to B
B task execution is completed and suspended, and C continues to read M
C completes the operation of reading M data, releases the control of M, and it is a's turn to modify M

Through the above process, it is obvious that although the priority of thread B is lower than that of thread a, it is executed first, which does not meet our requirements for real-time performance of the system.
2.2 solutions to mutex

The mutex uses the priority inheritance protocol to solve the above priority inversion problem:

ad862ca851af7f05406b3d88d230b11a.jpg.webp

C is ready and takes control of M
A is ready, the priority is higher than C, and the CPU processes a first
A attempts to obtain the control of M, because C is holding the control of M, so it hangs waiting; According to the priority inheritance protocol, the priority of thread C is promoted to be equal to a, that is, the thread priority relationship is: a = C > b; C continue reading M
C completes the operation of reading M data, releases the control right of M, and the priority is restored. It is a's turn to modify m and wake up a
A. task execution is completed and suspended; B is ready between 3 and 4. At that time, it cannot be executed because the priority is lower than C. at this time, the priority is higher than C, and the CPU gives priority to wake up B
B task execution is completed and suspended, and C continues to complete the task

3、 What problems do mutexes create
3.1 incorrect use of FIFO flag

When users need to avoid the above thread priority reversal problem, they need to use mutex to synchronize threads. The mutex is managed by the IPC container. Therefore, when a thread wants to obtain the mutex, it needs to queue in the IPC. There are two queuing modes for IPC:

RT_ IPC_ FLAG_ FIFO: first in first out. The queue is queued according to the first in first out method
RT_ IPC_ FLAG_ Prio: priority waiting. The queue will be queued according to the priority. The waiting threads with high priority will jump the queue and queue in front of the waiting threads with low priority

FIFO belongs to non real-time scheduling mode, and all queued threads no longer have priority characteristics. However, the function allows the user to use RT when creating / init mutexes_ IPC_ FLAG_ FIFO parameter, which causes the following conditions:

2832d07e4bdede3e851bb696ecd73a44.jpg.webp

C is ready and takes control of M
B is ready, the priority is higher than C, and the CPU gives priority to B
B attempts to obtain the control of M, because C is holding the control of M, so it hangs waiting; According to the priority inheritance protocol, the priority of thread C is promoted to be equal to that of B, that is, at this time, the thread priority relationship is: a > b = C, B enters the FIFO queue and ranks first side by side; C continue reading M
A is ready, the priority is higher than C, and the CPU processes a first
A attempts to obtain the control of M, because C is holding the control of M, so it hangs waiting; According to the priority inheritance protocol, the priority of thread C is promoted to be equal to that of a, that is, at this time, the thread priority relationship is: a = C > b, a enters the FIFO queue and ranks second and behind B according to the first in first out principle; C continue reading M
C completes the operation of reading M data, releases the control right of M, and the priority is restored. According to the FIFO queue, it is B's turn to hold the control right of M and wake up B
B completes the operation of reading M data, releases the control right of M, and the priority is restored. According to the FIFO queue, it is a's turn to hold the control right of M and wake up a
A task execution is completed and suspended, and B continues the task
B task execution is completed and suspended, and C continues the task

Therefore, we find that although a has a higher priority than B, B enters the FIFO queue first than a, so B obtains the control right of M and executes it first than a, which is not in line with our purpose of using mutex.

At the same time, because a hangs waiting for the mutex, a will not wake up until B releases the mutex (unless it times out). This will make other threads with priority higher than B and lower than a take precedence over a. Who can bear it?
3.2 correct use

The new version has fixed the priority inversion problem above. When creating / init mutex, the user’s queue mode (flag) is ignored and only RT is used_ IPC_ FLAG_ PRIO。

c5a35e5cef2d41d0e65ff629fc33f257.jpg.webp
4、 Summary

Mutex was born to solve the problem of priority inversion, but the wrong use of mutex will make the situation worse. At the same time, this bug is relatively hidden and difficult to detect. Beginners (such as me) are easy to use it incorrectly, and it is not easy to reproduce it during debugging. Therefore, it is important to fix this bug.
5、 End

Thanks to the feature interpretation meeting and lucky draw organized by RTT last night, I finally won a prize ha ha! Although it is a data line for the three prize, it is the first time I win the lottery with the lottery program. I really want to make complaints about the lottery program.

In addition, the interpretation will be played back (although I don’t know where it is for the time being).