To solve the problem of high CPU utilization of Mongo database caused by regular expression matching ($regex)

Time:2020-10-7

One day, it was monitored that the CPU utilization rate of Mongo database was much higher. After checking, it was found that it was caused by the following statements:


db.example_collection.find({
 "idField" : 
{ "$regex" : "123456789012345678"
} ,
 "dateField" : 
{ "$regex" : "2019/10/10"
}})

Usually, when I encounter this situation, my first reaction is the lack of index of related fields, resulting in a full table scan every time this statement is executed.

However, I used the explain() statement to analyze and found that the above two fields idfield and DateField are indexed, and the statement also uses the index. The results of explain() are as follows:


mgset-11111111:PRIMARY> db.example_collection.find({ "idField" : { "$regex" : "123456789012345678"} , "dateField" : { "$regex" : "2019/10/10"}}).explain("queryPlanner")
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "example_db.example_collection",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [
                {
                    "idField" : {
                        "$regex" : "123456789012345678"
                    }
                },
                {
                    "dateField" : {
                        "$regex" : "2019/10/10"
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "inputStage" : {
                "stage" : "IXSCAN",
                "filter" : {
                    "$and" : [
                        {
                            "idField" : {
                                "$regex" : "123456789012345678"
                            }
                        },
                        {
                            "dateField" : {
                                "$regex" : "2019/10/10"
                            }
                        }
                    ]
                },
                "keyPattern" : {
                    "idField" : 1,
                    "dateField" : 1
                },
                "indexName" : "idField_1_dateField_1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "idField" : [ ],
                    "dateField" : [ ]
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "forward",
                "indexBounds" : {
                    "idField" : [
                        "[\"\", {})",
                        "[/123456789012345678/, /123456789012345678/]"
                    ],
                    "dateField" : [
                        "[\"\", {})",
                        "[/2019/10/10/, /2019/10/10/]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ ]
    },
    "ok" : 1
}

Looking at Mongo’s log, it is found that this kind of statement takes 800 ~ 900ms to execute once, which is really slow. Unless there are a lot of CPU cores in the database, as long as the concurrency of such statements per second is slightly higher, the CPU will be occupied quickly.

After searching, it is found that there may be a regular expression problem. It turns out that although the statement does use an index, there is also a field “indexbounds” in the output of the explain() statement, which indicates the index range to be scanned when the statement is executed. To tell you the truth, in the above output, I have never understood its index range. The above statement performs regular expression matching for the fields idfield and DateField. I guess it scans the entire index tree, so the index does not actually improve the query efficiency of the statement.

I looked at the data in the database and found that there was no need for regular matching in the fields of idfield and DateField. It’s OK to do ordinary text matching. After removing the regular matching operation $regex, we analyze it again. The result is as follows:


mgset-11111111:PRIMARY> db.example_collection.find({ "idField" : "123456789012345678", "dateField" : "2019/10/10"}).explain("queryPlanner")
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "example_db.example_collection",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [
                {
                    "idField" : {
                        "$eq" : "123456789012345678"
                    }
                },
                {
                    "dateField" : {
                        "$eq" : "2019/10/10"
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "idField" : 1,
                    "dateField" : 1
                },
                "indexName" : "idField_1_dateField_1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "idField" : [ ],
                    "dateField" : [ ]
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "forward",
                "indexBounds" : {
                    "idField" : [
                        "[\"123456789012345678\", \"123456789012345678\"]"
                    ],
                    "dateField" : [
                        "[\"2019/10/10\", \"2019/10/10\"]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ ]
    },
    "ok" : 1
}

As you can see, the index is still used, and the index scan range is limited to one value.

Later, I confirmed with the developer that there was no need to use regular matching in this statement, so I asked him to remove the regular matching. After that, there was no more problem, and the statement did not appear in Mongo slow log.

summary

The above is what Xiaobian introduced to you to solve the problem of high CPU utilization of Mongo database caused by regular expression matching ($regex). I hope it will help you. If you have any questions, please leave me a message, and the editor will reply you in time. Thank you very much for your support to the developeppaer website!
If you think this article is helpful to you, welcome to reprint, please indicate the source, thank you!