from 20 seconds to 0.5 seconds: a case of using rust language to optimize Python performance

Time:2021-11-27

Note: from the official account of WeChat public to “high availability architecture”:From 20 seconds to 0.5 seconds: a case of using the rust language to optimize Python performance


Introduction: Python is widely used by many Internet systems, but on the other hand, it also has some performance problems. However, it is rare to use another language rust to replace Python in key modules shared by sentry engineers, which has also aroused heated discussion in the python circle. The highly available architecture editor translated and reproduced the article as follows.
 from 20 seconds to 0.5 seconds: a case of using rust language to optimize Python performance

Sentry is a cloud service that helps online businesses monitor and analyze errors. It processes more than one billion errors every month. We have been able to extend most of our systems, but in the past few months, the source map handler written in Python has become our performance bottleneck. Source map is the correspondence table between the compressed or confused code and the original code

Starting last week, the infrastructure team decided to investigate the performance bottlenecks of the source map handler—— Our JavaScript client has become our most popular program, one of the reasons is our ability to anti confuse JavaScript through source map. However, the processing operation is not without cost. We have to get, decompress, de obfuscate, and then expand backwards to make the JavaScript stack trace readable.

When we wrote the original processing pipeline four years ago, the source map ecosystem was just beginning to evolve. As it grew into a complex and mature source map handler, we spent a lot of time dealing with problems in Python.

As of yesterday, we replaced our old Python souce map processing module with rust module, which greatly reduced the processing time and CPU utilization on our machine.

To explain all this, we need to understand the disadvantages of source map and python.

Python’s source maps

As our users’ applications become more and more complex, their source maps become more and more complex. Parsing JSON itself in Python is fast enough because they are just strings.The problem is deserialization。 Each source map token generates a python object. We have some source maps that may have millions of tokens.

The problem of deserializing the source map token makes us pay a huge cost for basic Python objects. In addition, all these objects participate in reference counting and garbage collection, which further increases the overhead.Processing 30MB source map enables a single Python process to expand to ~ 800MB in memory, performs millions of memory allocations, and makes the garbage collector very busy(Note: token is a short life cycle object, which is much better if there is a new generation. At this time, it reflects the advantages of my big Java).

Since this deserialization requires object headers and garbage collection mechanisms, there is very little room for improvement in the python layer.

Rust’s source maps

After the investigation found that the problem was the performance defect of python, we decided to try the performance of the rust source map parser, which was written for our cli tool. After applying the rust parser to the problematic source map, it shows that parsing using the library alone canProcessing time reduced from > 20 seconds to < 0.5 seconds。 This means that even ignoring any optimizations, just replacing the python parser with the rust parser can alleviate our performance bottleneck.

After we proved that rust was indeed faster, we cleaned up some sentry internal APIs so that we could replace the original implementation with a new library. This Python library, named libsourcemap, is a thin wrapper for our own rust source map.

Optimization results

After deploying the library, the machine pressure dedicated to source map processing is greatly reduced.

 from 20 seconds to 0.5 seconds: a case of using rust language to optimize Python performance

The worst source map processing time is reduced to one tenth of the original.

 from 20 seconds to 0.5 seconds: a case of using rust language to optimize Python performance

More importantly, the average processing time is reduced to ~ 400 ms.

 from 20 seconds to 0.5 seconds: a case of using rust language to optimize Python performance

JavaScript is our most popular project language. This change reduces the end-to-end processing time of all events to ~ 300 ms.

 from 20 seconds to 0.5 seconds: a case of using rust language to optimize Python performance

Embedding rust in Python

There are many ways to expose the rust library to python. We chose to compile the rust code into a dylib and provide some ol’c functions to expose to Python through cffi and C header files. With the C language header file, cffi generates some shim (shim is a small function library for transparently intercepting API calls, modifying passed parameters, processing operations, or redirecting operations to other places). You can call rust. In this way, libsourcemap can open the dynamic shared library generated from rust at run time.

This process has two steps. The first is to configure the cffi build module when setup.py runs:

 from 20 seconds to 0.5 seconds: a case of using rust language to optimize Python performance

After building the module, the header file is processed by the C preprocessor to extend the macro (a process that cffi itself cannot perform). In addition, this will tell cffi where to place the generated shim module. After all are completed, load the module:

 from 20 seconds to 0.5 seconds: a case of using rust language to optimize Python performance

The next step is to write some wrapper code to provide a python API for the rust object so that exceptions can be forwarded. This happens in two processes: first, make sure that we use the result object as much as possible in the rust code. In addition, we need to deal with panic to ensure that they do not cross DLL boundaries. Secondly, we define a help structure that can store error information; And pass it as an out parameter to the function that may fail.

In Python, we provide a context manager:

 from 20 seconds to 0.5 seconds: a case of using rust language to optimize Python performance

We have a dictionary of special_errors, but if no specific error is found, a general sourcemaperror will be thrown.

From there, we can actually define the base class of source map:

 from 20 seconds to 0.5 seconds: a case of using rust language to optimize Python performance

Expose C API in rust

Let’s start with the C header containing some export functions. How do we export them from rust? There are two tools: the special [no_mangle] attribute and the STD:: panic module; A rust panic processor is provided. We have built some helpers to handle this: one function is used to notify python of an exception and two exception handling helpers, one is general, and the other wraps the return value. With this, the packaging method is as follows:

 from 20 seconds to 0.5 seconds: a case of using rust language to optimize Python performance

boxed_ The way landingpad works is simple. It calls the closure with panic:: catch_ Unwind captures the panic, unravels the result, and adds the success value to the original pointer. If an error occurs, it populates err_ Out and returns a null pointer. In LSM_ view_ In free, you just need to rebuild from the original pointer.

Build extension

To actually build extensions, we have to do something less elegant in setuptools. Fortunately, we didn’t spend much time on this, because we already have a similar tool to deal with it.

The most convenient part of this approach is to compile the source code with cargo and install the final dylib binary, eliminating the need for any end user to use the rust tool chain.

Those well done, those not well done?

I was asked on twitter, “what alternatives will rust have?” to be honest, rust is difficult to replace. The reason is that unless you want to rewrite the entire Python component in a better language, you can only use native extensions. In this case, the language requirements are quite harsh: it cannot have an intrusive runtime, it cannot have a GC, and it must support C abi. Now, I think the appropriate languages are C, C + + and rust.

Which aspect of work is good:

  • Combine rust and python with cffi. There are some alternatives that link to libpython, but the build is more complex.

  • In older CentOS versions, docker is used to build portable Linux containers. Although this process is tedious, the stability differences between different Linux development versions and kernels make docker and CentOS acceptable building solutions.

  • Rust ecosystem. We use the serde deserialization and Base64 libraries of rates.io, which work very well. In addition, MMAP supports the use of another library provided by the community memmap.

What aspects of work are not good:

  • Iteration and compilation times can really be better. We compile modules and header files every time we change characters.

  • Setuptools steps are very fragile. We may have spent more time making setuptools work. Fortunately, we’ve done it once before, so it’s easier this time.

Although rust is very helpful to our work, there is no doubt that there is a lot to be improved. In particular, the infrastructure for exporting C abi (and making it useful for Python) should have a lot of room for improvement. The compilation time is not very long either. I hope incremental compilation will help.

next step

In fact, we still have more room for improvement. Instead of parsing JSON, we can start caching in a more efficient format, such as a set of structures stored in memory. In particular, if paired with the file system cache, we can almost completely eliminate the cost of loading, because we divide the index equally, which can be very effective using MMAP.

In view of this good result, we are likely to evaluate that rust will handle more CPU intensive services in the future. However, for most other operations, the program spends more time waiting for Io.

Summary

Although this project has achieved great success, it only took us a little time to realize it. It reduces our processing time, and it will also help us expand horizontally. Rust has always been the perfect tool for this work because it allows us to use local libraries for expensive operations without using C or C + + (this is not suitable for this complex task). Although it’s easy to write a source map parser in rust, it’s more code and less interesting to use C / C + +.

We really like Python and are contributors to many Python open source projects. Although Python is still our favorite language, we believe in using the right language in the right place. Rust has proved to be the best tool for this work, and we are glad to see what rust and python will bring us in the future.

Translator’s note: for students who are not familiar with source map, please read Ruan Yifeng’s articlehttp://www.ruanyifeng.com/blo…