Principle analysis of obtaining CLR index by prometheus-net.dotnetruntime

Time:2021-11-25

prometheus-net.DotNetRuntimeintroduce

Intro

As mentioned in the previous article on integrating Prometheus,prometheus-net.DotNetRuntimeSome CLR data can be obtained, such as GC, ThreadPool, content, JIT and other indicators, which can help us solve many problems to a great extent, such as whether GC often occurs during application execution, whether GC waiting time is too long, whether there is life and death lock or competition lock for too long, and whether there is thread pool starvation, With these indicators, we can clearly understand this information at run time.

Let’s take a look at the official introduction

A plugin for the prometheus-net package, exposing .NET core runtime metrics including:

  • Garbage collection collection frequencies and timings by generation/ type, pause timings and GC CPU consumption ratio
  • Heap size by generation
  • Bytes allocated by small/ large object heap
  • JIT compilations and JIT CPU consumption ratio
  • Thread pool size, scheduling delays and reasons for growing/ shrinking
  • Lock contention
  • Exceptions thrown, broken down by type

These metrics are essential for understanding the peformance of any non-trivial application. Even if your application is well instrumented, you’re only getting half the story- what the runtime is doing completes the picture.

Supported indicators

Contention Events

As long as theSystem.Threading.MonitorWhen a lock or native lock is in contention, a contention event will be raised.

Contention occurs when a lock that a thread is waiting for is occupied by another thread.

Name Description Type
dotnet_contention_seconds_total Total time taken (seconds) for lock contention to occur Counter
dotnet_contention_total Total number of locks acquired by lock contention Counter

Thread Pool Events

Worker thread thread pool and IO thread pool information

Name Description Type
dotnet_threadpool_num_threads Number of active threads in the thread pool Gauge
dotnet_threadpool_io_num_threads Number of active threads in io thread pool (windowsonly) Gauge
dotnet_threadpool_adjustments_total Total thread tuning in thread pool Counter

Garbage Collection Events

Captures information pertaining to garbage collection, to help in diagnostics and debugging.

Name Description Type
dotnet_gc_collection_seconds Time spent executing GC recycle process (seconds) Histogram
dotnet_gc_pause_seconds Time spent in pause caused by GC recycle (seconds) Histogram
dotnet_gc_collection_reasons_total Statistics of the reasons for triggering GC garbage collection Counter
dotnet_gc_cpu_ratio Percentage of process CPU time spent running garbage collection Gauge
dotnet_gc_pause_ratio The percentage of time the process spent pausing for garbage collection Gauge
dotnet_gc_heap_size_bytes Current size of each GC heap (updated after garbage collection) Gauge
dotnet_gc_allocated_bytes_total Size the total number of bytes allocated on the object heap (updated every 100 kb) Counter
dotnet_gc_pinned_objects Number of pinned objects Gauge
dotnet_gc_finalization_queue_length Number of objects waiting to be finalized Gauge

JIT Events

Name Description Type
dotnet_jit_method_total The total number of methods compiled by the JIT compiler Counter
dotnet_jit_method_seconds_total Total time spent in JIT compiler (seconds) Counter
dotnet_jit_cpu_ratio CPU time spent JIT Gauge

Integration mode

The indicators listed above are more important indicators in my opinion, as well as some indicators of ThreadPool scheduling and CLR exception. I don’t think they are of great significance. If necessary, you can go to the source code and have a look

There are two ways to integrate. One is that the author provides a default collector to collect all supported CLR indicator information, and the other is to customize the CLR indicator types to be collected. Take an example:

Collect CLR metrics using the default collector

DotNetRuntimeStatsBuilder.Default().StartCollecting();

Collect CLR metrics using a custom collector

DotNetRuntimeStatsBuilder.Customize()
    .WithContentionStats() // Contention event
    . withgcstats() // GC indicator
    . withthreadpoolstats() // ThreadPool indicator
    //. withcustomcollector (null) // you can implement a custom collector yourself
    .StartCollecting();

As mentioned above, the default collector will collect all supported CLR indicators. Let’s see what the source code does

Built aBuilderThe complex configuration collector is built through the builder pattern, which is similar to that in. Net coreHostBuilder/LoggingBuilder… very muchHost.CreateDefaultBuilder, did some deformation

Source address:https://github.com/djluck/prometheus-net.DotNetRuntime/blob/master/src/prometheus-net.DotNetRuntime/DotNetRuntimeStatsBuilder.cs

Implementation principle

How does it work and how to achieve the indicator of capturing CLR? Let’s decrypt it,

A brief introduction has been given in the project readme, which is implemented based on CLR ETW events. For specific CLR supported ETW events, please refer to the following documents:https://docs.microsoft.com/en-us/dotnet/framework/performance/clr-etw-events

ETW events is throughEventSourceThe method enables us to obtain some operation information of the process outside the process, which is also an important implementation way for us to obtain process CLR information outside the process through perfmonitor / perfview. Similarly, the implementation way of Microsoft’s new diagnostic tool dotnet diagnostic toolsEventPipeAlso based onEventSOurceof

andEventSourceEvents can not only be consumed through these tools outside the process, but also implemented in the applicationEventListenerTo achieve in-processEventSourceEvent consumption, and this isprometheus-net.DotNetRuntimeThe implementation method of this library

You can refer to the source code:https://github.com/djluck/prometheus-net.DotNetRuntime/blob/master/src/prometheus-net.DotNetRuntime/DotNetEventListener.cs

The specific event processing is in the corresponding collector:

https://github.com/djluck/prometheus-net.DotNetRuntime/tree/master/src/prometheus-net.DotNetRuntime/StatsCollectors

Metrics Samples

In order to intuitively see the effects of these indicators, share some dashboard screenshots used in my application

Lock Contention

GC

It can be clearly seen from the above figure that a garbage collection occurs at this time point. At this time, the size of GC heap and the CPU utilization and time consumption of GC garbage collection can be roughly seen, which will be very helpful for us to diagnose application problems at runtime

Thread

The thread information can also get the number and delay of ThreadPool thread scheduling, which is not shown here,

At present, I mainly focus on the number of threads in the thread pool and the reasons for thread adjustment in the thread pool. One of the reasons for thread adjustment in the thread pool isstarvation, this indicator needs special attention. The ThreadPool stavation should be avoided. The reason for this is usually due to some improper usage, such as:Task.WaitTask.Resultawait Task.Run()To turn a synchronous method into asynchronous and other bad usage

DiagnosticSource

exceptEventSourceBesides, there is another oneDiagnosticSourceIt can help us diagnose the performance problems of applications. At present, Microsoft also recommends it in the class libraryDiagnosticSourceThis is also the mechanism implemented by most APM at present. Skywalking, elastic APM, opentelemetry, etc. are all usedDiagnosticSourceTo implement application performance diagnosis

Performance diagnostics for out of process applications are recommendedEventSource, if it is in-process, it is recommendedDiagnosticSource

Usually we should useDiagnosticSource, even if you want to capture out of process, you can do it

For the use of the two, you can take a look at this commenthttps://github.com/dotnet/aspnetcore/issues/2312#issuecomment-359514074

More

In addition to the indicators listed above, there are also some indicators, such as exception, ThreadPool scheduling, and the current dotnet environment (system version, GC type, runtime version, program targetFramework, CPU number, etc.). Those who are interested can try it

The exception indicator is not helpful when used. Some exceptions that have been processed or ignored will be counted. Most of these exceptions will not affect the operation of the application. If you refer to this, it may cause a lot of trouble. Therefore, I think it is more appropriate to use the application to count the exception indicator

prometheus-net.DotNetRuntimeAsprometheus-netA plug-in that relies onprometheus-netTo write metrics information, that is, metrics information can beprometheus-netTo get

When integrating asp.net core and beforeprometheus-netIt’s the same. The metrics path is the same. You can refer to my project:https://github.com/OpenReservation/ReservationServer/tree/dev/OpenReservation

Note: the author recommends. NETCORE 3.0 and above. NETCORE 2. X will have some bugs, which can be seen in issue

Reference

Recommended Today

Vue、Three. JS implementation panorama

1、 First, we need to create a Vue project This paper mainly records the process of building panorama in detail, so building Vue project is not described too much. 2、 Install three js npm install three –save npm install three-trackballcontrols –save npm install three-orbit-controls –save npm i three-obj-mtl-loader –save npm i three-fbx-loader –save npm i […]