Flame graph: Linux performance analysis from a global perspective



In our daily work, we will receive a lot of warning mail about high CPU utilization, and encounter some service failureThe CPU is fullAt this time, we need to see what process is taking up the CPU resources of the server. Usually we passtopperhapshtopTo quickly view the process that occupies the highest CPU, as shown in the following figure:

Flame graph: Linux performance analysis from a global perspective

This is through a common server to do demonstration use, as shown in the figure, the current server occupies the highest CPU is a server calledkube-apiserverCommand to run a process, the PID of the process is25633Of course, you may encounter multiple services running on a server. If you want to know the processes with the highest occupancy rate quickly, you can use the following command:

PS aux | head - 1; PS - aux | sort - k3nr | head - N 10 // view the top 10 most CPU intensive processes
PS aux | head - 1; PS - aux | sort - k4nr | head - N 10 // view the top 10 most memory consuming processes

However, after obtaining the process that the server takes up resources through the above methods, it is still a problemI don't know where the CPU consumption isIf you don’t know where the bottleneck is, you can go through itLinuxThe performance analysis tool of the systemperfAnalyze, analyze the functions that are consuming CPU and call stack. It can then be parsedperfAfter the collected data is rendered to the flame diagram, we can clearly know the culprit who occupies the CPU resources of the system.

Before making the flame diagram, we need to talk about this Linux performance analysis toolperfThe tool is a relatively simple and easy to use performance analysis toolPerformanceAbbreviation of a word by itsperfTo complete the collection and analysis of system events

Performance analysis tools on LinuxPerf


My current server distribution isUbuntu 16.04.6 LTSTherefore, you need to install perf before you can use itlinux-tools-commonYes, but it needs to install the later dependencies.

[email protected]:~# apt install linux-tools-common linux-tools-4.4.0-142-generic linux-cloud-tools-4.4.0-142-generic -y

[email protected] : ~ # perf - V # displays the version of perf
perf version 4.4.167

When the installation is completed, the process ID with the highest CPU utilization in the figure above can be25633Process for sampling analysis.

First of all, let’s collect the data of this processCall stackInformation:

[email protected]:~# sudo perf record -F 99 -p 25633 -g -- sleep 30
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.039 MB perf.data (120 samples) ]

This command will generate a large data file, depending on the process you collect and the CPU configuration. If a server has 16 CPUs, 99 samples per second for 30 seconds, it will get 47520 call stacks, hundreds of thousands or even millions of lines. In the above command,perf recordRepresents a record,-F 99It means 99 times per second,-p 25633Is the process number, which process is analyzed,-gRepresents the record call stack,sleep 30It lasts for 30 seconds, and the parameter information can be adjusted according to the situation. The generated data collection file is in the current directory, and its name isperf.data

perf recordThe command can be arranged from high to low to count the percentage of each call stack, and the display result is as follows:

[email protected]:~# sudo perf report -n --stdio

Flame graph: Linux performance analysis from a global perspective

This effect is still not so intuitive and easy to read for the user. At this time, the flame diagram is really useful.

Making a flame map

The flame diagram is not necessarily the color theme of the flame series, but it can better express the meaning through the color system. The common types of flame diagram are on CPU, off CPU, memory, hot / cold[Differential]( http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html "Differential" and so onThe difference between on CPU and off CPU is that the CPU is the performance bottleneck and IO is the performance bottleneck. When you don’t know what the performance bottleneck of the current server is, you can use these two types for comparison. The difference between the two flame diagrams is relatively large. If the two flame diagrams are similar, then it is generally considered that the CPU is preempted by other processes

Another situation is that if the current system bottleneck can not be determined, it can be confirmed by the pressure testing tool: use the pressure testing tool to see if the CPU utilization can be saturated, and if so, use itOn-CPUFlame diagram, if no matter how much pressure, CPU utilization is always not up, then most of the program is ignoredIOOr the lock is stuck, which is suitable for use at this timeOff-CPUFlame diagram. You can test it with pressure measuring tools. At present, the most commonly used one is flame diagramabandwrkI suggest using more modern pressure measuring tools such as wrk

If you chooseabIf so, be sure to turn it on-kOption to avoid running out of available ports on the system

It’s on GitHubBrendan D. GreggOfFlame GraphThe project implements a set of scripts to generate the flame diagram. We can clone them directly and use them directly.

cd && git clone https://github.com/brendangregg/FlameGraph.git

To generate the flame diagram, we generally follow the following process

Flame graph: Linux performance analysis from a global perspective

  • Capture stack: useperfCapture process run stack information
  • Collapse stack: analyze and combine the stack information of the system and program at each time, and accumulate the repeated stacks together, so as to reflect the load and critical pathstackcollapseScript Complete
  • Generate flame map: analyze the stack information output by stackcollapse and render it into flame graph

Flame GraphProvides scripts to capture different information, which can be used on demand. Next, we need to capture the process stack informationperf.dataTo generate the folded stack information:

[email protected]:~# perf script -i /root/perf.data &> /root/perf.unfold

usestackcollapse-perf.plThe content parsed by perfperf.unfoldCollapse the symbols in

[email protected]:~/FlameGraph# ls
aix-perf.pl    docs                        example-perf.svg  pkgsplit-perf.pl  stackcollapse-aix.pl       stackcollapse-go.pl               stackcollapse-ljp.awk         stackcollapse-pmc.pl        stackcollapse-vsprof.pl   test.sh
demos          example-dtrace-stacks.txt   files.pl          range-perf.pl     stackcollapse-bpftrace.pl  stackcollapse-instruments.pl      stackcollapse-perf.pl         stackcollapse-recursive.pl  stackcollapse-vtune.pl
dev            example-dtrace.svg          flamegraph.pl     README.md         stackcollapse-elfutils.pl  stackcollapse-java-exceptions.pl  stackcollapse-perf-sched.awk  stackcollapse-sample.awk    stackcollapse-xdebug.php
difffolded.pl  example-perf-stacks.txt.gz  jmaps             record-test.sh    stackcollapse-gdb.pl       stackcollapse-jstack.pl           stackcollapse.pl              stackcollapse-stap.pl       test
[email protected]:~/FlameGraph# ./stackcollapse-perf.pl /root/perf.unfold &> /root/perf.folded
[email protected]:~/FlameGraph#

Finally, the flame diagram is generated

[email protected]:~/FlameGraph# ./flamegraph.pl /root/perf.folded > /root/perf.svg

Of course, you can also use pipe symbols|Simplify the whole process:

cd && perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > process.svg

Finally, open the flame map on Google browser

Flame graph: Linux performance analysis from a global perspective

The flame map is based onstackInformation generationSVGPicture, used to show the CPU call stack.

  • yThe axis represents the call stack, and each layer is a function. The deeper the call stack is, the higher the flame is. The top is the function being executed, and the bottom is its parent function
  • xThe x-axis represents the number of samples. If a function occupies a wider width on the x-axis, it will be drawn more times, that is, it will take a long time to execute. Note that the x-axis does not represent the time, but all call stacks are merged and arranged in alphabetical order

Flame diagram is to see which function on the top occupies the largest width"Plateaus", which means that the function may have performance problems. Color has no special meaning, because the flame map represents the busy degree of CPU, so generally choose warm color

WhenCall stack incompleteWhen the call stack is too deep, some systems only return the previous part (such as the first 10 layers)Missing function nameFunctions have no names, and compilers only use memory addresses to represent them (such as anonymous functions), so there is something that cannot be analyzed by using flame diagram. You can also collect and analyze the flame diagram through the following script:

if [ $# -ne 1 ];then
    echo "Usage: $0 seconds"
    exit 1
perf record -a -g -o perf.data &
PID=`ps aux| grep "perf record"| grep -v grep| awk '{print $2}'`
if [ -n "$PID" ]; then
    sleep $1
    kill -s INT $PID
# wait until perf exite
sleep 1

perf script -i perf.data &> perf.unfold
perl stackcollapse-perf.pl perf.unfold &> perf.folded
perl flamegraph.pl perf.folded >perf.svg

Flame graph: Linux performance analysis from a global perspective

Recommended Today

Practice of query operation of database table (Experiment 3)

Following the previous two experiments, this experiment is to master the use of select statements for various query operations: single table query, multi table connection and query, nested query, set query, to consolidate the database query operation.Now follow Xiaobian to practice together!Based on the data table (student, course, SC, teacher, TC) created and inserted in […]