When the master is down, Pt heartbeat keeps retrying, which will lead to slow memory growth and Solutions

Time:2020-1-15

Recently, colleagues reported that in the process of using Pt heartbeat to monitor master-slave replication latency, if the master is down, Pt heartbeat will fail to connect, but will try again and again.

It’s OK to try again. After all, from the user’s point of view, I hope Pt heartbeat can try again and again until it reconnects to the database. However, they found that continuous retries would lead to slow memory growth.

Reproduce

Environmental Science:

PT heartbeat v2.2.19, MySQL community v5.6.31, Perl V5.10.1, RHEL 6.7, memory 500m

In order to avoid the impact of database startup and shutdown on Pt heartbeat memory utilization, MySQL and Pt heartbeat are running on different hosts respectively.

Run Pt heartbeat

# pt-heartbeat –update -h 192.168.244.10 -u monitor -p monitor123 -D test –create-table

Monitoring the memory usage of Pt heartbeat

Get PID


# ps -ef |grep pt-heartbeat
root 1505 1471 0 19:13 pts/0 00:00:08 perl /usr/local/bin/pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 -D test --create-table
root 1563 1545 2 19:50 pts/3 00:00:00 grep pt-heartbeat

View the memory usage of the process

# top -p 1505

Running 0:15.00 (time + column), MEM has been stable at 3.3%

Now close the database

# service mysqld stop

The PT heartbeat command just now continuously outputs the following information

After the same CPU time, MEM increased to 4.4%, increased by 1%. Considering the memory of 500m, the memory occupation of the process increased by 5m, although not a lot, but considering that the memory increase of the process did not stop, this phenomenon should be noticed.

At the same time, through the PMAP command, it is found that the RSS and dirry of the 000000000 1331000 address will also grow at a rate of 4K / s

Later, when I studied the source code of Pt heartbeat, I found that the code was a bit buggy


my $tries = 2;
while ( !$dbh && $tries-- ) {
PTDEBUG && _d($cxn_string, ' ', $user, ' ', $pass,
join(', ', map { "$_=>$defaults->{$_}" } keys %$defaults ));
$dbh = eval { DBI->connect($cxn_string, $user, $pass, $defaults) };
if ( !$dbh && $EVAL_ERROR ) {
if ( $EVAL_ERROR =~ m/locate DBD\/mysql/i ) {
die "Cannot connect to MySQL because the Perl DBD::mysql module is "
. "not installed or not found. Run 'perl -MDBD::mysql' to see "
. "the directories that Perl searches for DBD::mysql. If "
. "DBD::mysql is not installed, try:\n"
. " Debian/Ubuntu apt-get install libdbd-mysql-perl\n"
. " RHEL/CentOS yum install perl-DBD-MySQL\n"
. " OpenSolaris pgk install pkg:/SUNWapu13dbd-mysql\n";
}
elsif ( $EVAL_ERROR =~ m/not a compiled character set|character set utf8/ ) {
PTDEBUG && _d('Going to try again without utf8 support');
delete $defaults->{mysql_enable_utf8};
}
if ( !$tries ) {
die $EVAL_ERROR;
}
}
}

The above code is extracted from the get ﹣ DBH function, which is used to obtain the connection of the database. If the acquisition fails, try again once, and then exit by throwing an exception through the die function.

However, by setting the following breakpoints, it is found that when $tries is 0, the ptdebug & & & “d (” $Eval “error”) statement in the if function can execute, but the die function just doesn’t throw an exception and exits the script


PTDEBUG && _d($tries);
if ( !$tries ) {
PTDEBUG && _d("$EVAL_ERROR"); 
die $EVAL_ERROR; }

Later, modify the last if function of the above code as follows:


if ( !$tries ) {
die "test:$EVAL_ERROR";
}

Test again

Start database

# service mysqld start

Execute Pt heartbeat command

# pt-heartbeat –update -h 192.168.244.10 -u monitor -p monitor123 -D test –create-table

Stop database

# service mysqld stop

The PT heartbeat command just executed exited abnormally

“Test:” is the added test character.

conclusion

It’s strange that a simple die $Eval? Error will not throw an exception and exit the script, but the modified die “test: $Eval? Error” will exit the script.

Obviously, it’s a bug. I don’t know if it’s related to the Perl version.

Curious, how does a failed connection lead to a growing amount of memory?

Finally, a bug was raised for percona officials

https://bugs.launchpad.net/percona-toolkit/+bug/1629164

The above is what Xiaobian introduced to you. When the master is down, Pt heartbeat keeps retrying, which will lead to slow memory growth and solutions. I hope it can help you. If you have any questions, please leave me a message and Xiaobian will reply to you in time!

Recommended Today

[reading notes] calculation advertising (Part 3)

By logm This article was originally published at https://segmentfault.com/u/logm/articles and is not allowed to be reproduced~ If the mathematical formula in the article cannot be displayed correctly, please refer to: Tips for displaying the mathematical formula correctly This article isComputing advertising (Second Edition)Reading notes. This part introduces the key technology of online advertising, which is […]