Monitoring best practice redis and business interface

Time:2021-4-29

Introduction:Monitoring best practice redis and business interface

Monitoring best practice redis and business interface

1. Background

1.1 problems

On December 4, 2020, the CPU of db0 monitored by redis cluster version on the client side suddenly increased to 100%, resulting in the failure of normal service of the database. After investigation, there was a big key of about 2m on the client side business, resulting in db0 blocking. And the cluster connection mode used by the client side is the default proxy mode, as shown in the figure below. Db0 blocking causes other nodes to fail to serve normally; Processing method: the client side cooperates to cut off the frequent calls of the big key service and request recovery.

Monitoring best practice redis and business interface
Figure 1: proxy mode

1.2 thinking

This problem caused serious damage to the entrance of course registration on the customer side, which led to in-depth thinking. In the aspect of using redis and other products, the monitoring and alarm means are not perfect and careful enough, and the subsequent review of the business log shows that the error rate has gradually increased until the redis level shows up. In view of the problem of big key in redis, this paper provides customers with the analysis method of big key and hot key, and proposes to improve the readability of customer side monitoring alarm and error alarm of business log interface.

2. Database monitoring and analysis

2.1 sharing of redis monitoring indicators

The cloud monitoring indicators of redis cluster version are shown in the following table.

Monitoring items Company MetricName Dimensions Statistics
Average response time us ShardingAvgRt userId、instanceId、nodeId Average、Maximum
Connection usage % ShardingConnectionUsage userId、instanceId、nodeId Average、Maximum
CPU utilization % ShardingCpuUsage userId、instanceId、nodeId Average、Maximum
hit rate % ShardingHitRate userId、instanceId、nodeId Average、Maximum
Inflow direction flow KByte/s ShardingIntranetIn userId、instanceId、nodeId Average、Maximum
Inflow bandwidth utilization % ShardingIntranetInRatio userId、instanceId、nodeId Average、Maximum
Flow rate in outlet direction KByte/s ShardingIntranetOut userId、instanceId、nodeId Average、Maximum
Outflow bandwidth utilization % ShardingIntranetOutRatio userId、instanceId、nodeId Average、Maximum
Number of keys in cache individual ShardingKeys userId、instanceId、nodeId Average、Maximum
Maximum response time us ShardingMaxRt userId、instanceId、nodeId Average、Maximum
Memory usage % ShardingMemoryUsage userId、instanceId、nodeId Average、Maximum
QPS utilization rate % ShardingQPSUsage userId、instanceId、nodeId Average、Maximum
Used connections individual ShardingUsedConnection userId、instanceId、nodeId Average、Maximum
Memory usage Bytes ShardingUsedMemory userId、instanceId、nodeId Average、Maximum、Sum
Average visits per second individual ShardingUsedQPS userId、instanceId、nodeId Average、Maximum

2.2 redis big key analysis

1. Select the corresponding instance in the console and analyze the big key and hot key.

Monitoring best practice redis and business interface
Figure 2: example analysis

2. Use API interface to analyze big key and hot key.

Cache analysis and hot key query can refer to the following information for details [1].

2.3 monitoring on the same link of database

Creating group alarm rules has been updated to the group interface.

2.3.1 create application group

Monitoring best practice redis and business interface
Figure 3: creating application groups

2.3.2 creating alarm rules

Monitoring best practice redis and business interface
Figure 4: creating alarm rules

Monitoring best practice redis and business interface
Figure 5: setting alarm rules

3. Log monitoring

Using SLS to access the client log, we can set up the dashboard and alarm by setting rules. In this scheme, log access adopts logtail mode of Intranet transmission.

3.1 installing logtail

For the installation of logtail method, please refer to the following [2].

3.2 create project and logstore

Log in to the log service console and create the project and logstore of the corresponding region in turn.

Monitoring best practice redis and business interface
Figure 6: project logstore creation

3.3 data access Wizard

The client side log formats are JSON and log4j.

3.3.1 json

Select JSON text log > select existing machine group > corresponding logtail configuration

Monitoring best practice redis and business interface
Figure 7: logtail configuration

1. Set index

For multiple JSON logs, you need to change the field type to JSON.

Monitoring best practice redis and business interface
Figure 8: setting index

2. Query and analysis

Monitoring best practice redis and business interface
Figure 9: query analysis

3.3.2 log4j

Select regular text log\>Select an existing machine group\>Corresponding logtail configuration
1. Regular recognition of the first line

Monitoring best practice redis and business interface
Figure 10: setting up automatic generation

2. Extract fields

Monitoring best practice redis and business interface
Figure 11: log extraction fields

3. Set index
Note: it only works for newly written data.

Monitoring best practice redis and business interface
Figure 12: setting index

4. Query and analysis

Monitoring best practice redis and business interface
Figure 13: query analysis

3.4 log alarm

3.4.1 instrument panel

Monitoring best practice redis and business interface
Figure 14: dashboard information display

3.4.2 alarm

Click alarm in the navigation bar on the upper right side of the instrument and select Create in the drop-down menu.

Monitoring best practice redis and business interface
Figure 15: creating alarms

Monitoring best practice redis and business interface
Figure 16: alarm content setting

For the alarm content of the nail robot, please refer to the template [3] for setting.

reference

[1] Cache analysis and hot key query:https://help.aliyun.com/document\_detail/184226.html?spm=a2c4g.11186623.6.975.255f3635R5By1i
[2] Install logtail (Linux system)https://help.aliyun.com/document\_detail/28982.html?spm=a2c4g.11186623.2.5.31a09d7cBfTtvl
[3] Nail robot alarm template:https://help.aliyun.com/document\_detail/91785.html?spm=5176.2020520112.0.dexternal.62b334c0S2Jxx2

We are the SRE team of alicloud intelligent global technology services. We are committed to becoming a technology-based, service-oriented, high availability engineer team; Provide professional and systematic SRE services to help customers better use the cloud, build more stable and reliable business systems based on the cloud, and improve business stability. We hope to share more technologies to help enterprise customers go to the cloud, make good use of the cloud, and make their cloud business run more stably and reliably. You can scan the QR code below by nailing, join the nailing circle of Alibaba cloud SRE Institute of technology, and communicate with more cloud people about the cloud platform.

Copyright notice:The content of this article is spontaneously contributed by alicloud real name registered users, and the copyright belongs to the original author. The alicloud developer community does not own its copyright, nor does it bear the corresponding legal responsibility. For specific rules, please refer to the user service agreement of alicloud developer community and the guidelines for intellectual property protection of alicloud developer community. If you find any suspected plagiarism content in the community, fill in the infringement complaint form to report. Once verified, the community will immediately delete the suspected infringement content.