Thread said: I don’t want to explode. I just blame you for nd4j’s useless

Time:2021-9-14

1、 Project introduction

web_rec_comm_ctr

Background:

Last year, it took over a sorting service for sorting playlists, sounds and anchors. Since taking over, I have handled the memory overflow problem, and there have been no other situations in the future. However, recently, the project has been used for offline task calculation. And the problem occurs after the expansion of the calculation.

Project background:

  1. Cooperation mode between the project and algorithm: the project provides interface specifications, involving: sorting algorithm loading, automatic update, model call, input parameter analysis, and informing the feature data required by the model (including feature table, table fields, etc.).
  2. What the project needs to do: load algorithm – > parse request data – > obtain characteristic data – > call model sorting – > parse sorting results – > result assembly and return.

2、 Problem background

1. It is found that the k8s container of the project will restart.
In case of problems, container configuration: 4 CPUs: required for sorting calculation; Memory: local loading of 6G: w2v model in heap; Off heap 3G: various algorithm packages are used for computing.

3、 Problem conclusion:

Nd4j calculation framework is used for calculationOpenMPLibrary: OpenMP is an open source parallel programming API that supports C / C + + / Fortran. Nd4j uses the back end written in C + +, so we use OpenMP to improve the parallel computing performance of the CPU.), Pthread is called directly from the library_ Create for thread creation, multi-threaded parallel computing. Since you don’t know much about the package in this field, you don’t dig deep into the optimization of the computing framework. Directly abandon the library and use other methods for calculation.

4、 Troubleshooting process

View the monitoring system and observe the resource status of the container instance when the restart occurs
1.png
Note: don’t worry about the monitoring data chart of this project. There are so many burrs. This sort service is for offline scheduled tasks.

Monitoring data observation:

  • First, the number of threads presents an exception, with a maximum of nearly 8K.
  • Secondly, it is found that when the problem first occurs, the data volume of the task is relatively small, rather than a large number of calculations.
  • Third, in most cases, the restart time just coincides with the peak of the thread. Continuous restart is generally: when the service was restarted last time, there were a large number of requests, the threads increased sharply, and then restarted again
  • Fourth, not every task trigger will restart, and according to the thread diagram, the thread has the action of recycling, which is unlikely to be a permanent resource leak. It’s a familiar feeling. Is it that the resources are not released after use until they are passively released during garbage collection

According to the first point, when the next task comes, dump the offline process stack: jstack PID, and use the thread analysis website:fastThread
2.png
At this time, my expression is as follows:3.png

  • The 8K threads agreed… Is it… I opened it in the wrong way… Well, I was stunned. The threads generated by the jstack command dump are generally managed threads generated by the JVM, while the threads generated by the native method are not managed by the JVM, which is why the thread stack generated by the jstack command dump is only so.
  • Note: don’t try to dump the thread with the jstack – M parameter. To tell you the truth, the state of mind will jump even more after reading the things from dump. Well, I can’t understand it.
  • After consulting with the operation and maintenance leader, the – L parameter of PS command is used in the monitoring, and the use of PS – EFL | grep PID | WC – L is correct according to the statistics of the monitoring system.
  • At this time, it is suspected that it is caused by the proliferation of local threads.

Locate which native threads are created by: (the following methods are from Mr. Li’s guidance and learn from Mr. Li.)

4.png

  • According to the above figure, you can lock MKL and nd4j and see where to introduce: MVN dependency: tree in the project

5.png

  • The library is introduced from the model calculation package provided by algorithm colleagues. Looking through the code, I found that the library was actually used in the calculation. The following fragments are randomly selected.

    public float[] userWord2Vec(List> list, Word2VEC word2vecModel, int audioVectorDim,
    int topK, boolean norm){
    /*

    *@ Description: obtain the distributed representation of the user [take the sum or mean value of the program vector of the playback sequence as the vector representation of the user]
         *@ parameter: [list, TOPK, norm]
         *@ return value: float []
         *@ created on: 7 / 23 / 19
         */
        float[] userVec_ = new float[audioVectorDim];
        INDArray userVec = Nd4j.create(userVec_, new int[]{1, audioVectorDim});
    Copy code
  • Looking through the code of colleagues in the algorithm, we found that the indarray object did not release resources after use. We tried to modify the code and release the used resources after calculation. Unfortunately, despite the release code, the same situation still occurs after the release.

  • At this time, you can only turn over the source code. After all, the problem is the proliferation of threads. See if there is a place to set, limit the number of threads used in the library and sacrifice concurrency. The following is the source code of executorserviceprovider class under org.nd4j: nd4j API: jar: 1.0.0-beta4: compile dependency

    public class ExecutorServiceProvider {

    public static final String EXEC_THREADS = "org.nd4j.parallel.threads";
    public final static String ENABLED = "org.nd4j.parallel.enabled";
    
    private static final int nThreads;
    private static ExecutorService executorService;
    private static ForkJoinPool forkJoinPool;
    
    static {
        int defaultThreads = Runtime.getRuntime().availableProcessors();
        boolean enabled = Boolean.parseBoolean(System.getProperty(ENABLED, "true"));
        if (!enabled)
            nThreads = 1;
        else
            nThreads = Integer.parseInt(System.getProperty(EXEC_THREADS, String.valueOf(defaultThreads)));
    }
    Copy code
  • Through the above figure, try to add – dorg. Nd4j. Parallel. Enabled = false to the startup parameters to directly cut through the concurrent calculation. So sad, the result is still thread flooding. The setting here should limit the change from parallel computing to single thread computing, which does not solve the problem that thread resources are not recycled.

  • But I can only turn to Google:Workspace GuideNative CPU optimization for deep learning 4JTry to modify thread, garbage collection and other configurations, but there is still no improvement. (nd4j really doesn’t understand it. I’m confused about the concept of workspace…)

5、 Solution

Finally, we can only turn to the algorithm boss to see whether we can change other libraries for calculation or realize calculation by ourselves. First, confirm the cost of change and ensure to solve it at the least cost:

  1. Most sorting algorithms in the project have been migrated to the “model service platform”, and the remaining algorithms only support a small amount of computing work, so only the algorithms used in the project need to be modified here. (well… There are only two left.)
  2. Among the algorithms used, nd4j is used for matrix calculation, rather than complex model training or model calculation. Therefore, the replacement calculation logic can be quickly replaced or quickly handwritten by other toolkits.
  • After the modification and implementation of the algorithm boss, re introduce the distribution observation. Thank God, it’s finally back to normal.

6.png

  • In fact, a node was secretly posted after the version was released. The version was before the problem was solved. When the memory is increased to 15g, 6G is still in the heap and 9g is reserved outside the heap. This node is reserved and a lot of external memory is reserved to verify whether this modification solves the problem. Sure enough, although there was no restart after the node was released. However, its memory once exceeded 9g. If it is not expanded, it should be restarted again. And the number of threads increases sharply. However, the thread is recycled again. I guess it is the impact of GC. After the objects in the heap are recycled, the resources pointing to the outside are also recycled. I’ll find out later. Now keep watching to see if there are any more problems.

7.png

6、 Summary

For the use of the third database involving the call of native methods, it is best to understand its working principle before use, so as to make the use of resources controllable and release resources in time. Although the knowledge field involved in this question is relatively unfamiliar, I still try my best to understand what impact the things introduced into my project will have.

After reading three things ❤️

If you think this article is very helpful to you, I’d like to invite you to help me with three small things:

  1. Praise, forwarding, and your “praise and comments” are the driving force for my creation.

  2. Pay attention to the official account.Rotten pig skin“And share original knowledge from time to time.

  3. At the same time, we can look forward to the follow-up articles

Author: onedaylin
Link:club.perfma.com/article/189…