Secret of node.js memory leak

Time:2020-2-25

By giovanny Gongora

Crazy technology house

Original: https://nodesource.com/blog/m

No reprint without permission

For a long time, tracking the memory leak of node.js is a recurring topic, and people always want to know more about its complexity and reasons.

Not all memory leaks are obvious. However, once we determine its pattern, we must look for correlations between memory usage, objects stored in memory, and response time. When examining objects, you should study how many objects are collected and whether they are normal, based on the framework or technology you use, such as server-side rendering. Hopefully, by the end of this article, you will be able to understand and find a strategy to debug the memory consumption of the node.js program.

Garbage collection mechanism in node.js

JavaScript is a garbage collection language, and Google V8 was originally a JavaScript engine created for Google Chrome, which can be used as a stand-alone runtime in many cases. Two important operations of the garbage collector in node.js are:

  1. Identify useful or useless objects, and
  2. Reclaim or reuse the memory occupied by useless objects.

The important thing to remember: when the garbage collector is running, it will completely suspend your program until it is finished. Therefore, you need to maintain references to objects to minimize their work.

The V8 JavaScript engine automatically allocates and deallocates all memory used by the node.js process. Let’s see what the actual situation is.

If you think of memory as a tree structure, you can imagine V8 saving all the variables in the program from the “root node”. This may be your window object, or the global object in the node.js module, commonly known as the controller. One thing to keep in mind is that you have no control over how to unassign the “root” node.

Secret of node.js memory leak

Next, you’ll find an object node, usually called a leaf (a node that has no child references). Finally, there are four types of data in javascript: Boolean, string, number, and object.

V8 will traverse the tree and try to identify data groups that are not accessible from the root node. If the data is not accessible from the root node, V8 assumes that it is no longer used and frees up memory. Remember: to determine whether an object is active, you need to check whether it can be reached through a pointer chain defined as an active object; all other situations, such as objects that cannot be accessed from the root node or referenced by the root node or another active object, are considered garbage.

In short, the garbage collector has two main tasks:

  1. Track
  2. Calculates references between objects.

It can be tricky when you need to track a remote reference from another process, but in the node.js program, we usually use a single process, which makes it easier for us.

Memory scheme of V8

V8 uses a Java virtual machine like solution and divides memory into segments. What implements this packaging scheme is called a “resident set”, which refers to the memory part occupied by the process residing in RAM.

In the resident set, you will find:

  • Code segment:Where the code actually executes.
  • Stack:Contains local variables and all value types whose pointers refer to objects on the heap or control flows that define the program.
  • Heap:A memory segment dedicated to storing reference types such as objects, strings, and closures.

Secret of node.js memory leak

There are two other important points to remember:

  • Shallow size of object:The amount of memory required to save the object itself
  • Reserved size of object:The amount of memory freed when an object and its dependencies are deleted

Node.js has an object that describes the memory usage of the node.js process in bytes. Inside the object, you will find:

  • rss:Is the resident set size.
  • Heaptotal and heapused:Refers to the memory usage of V8.
  • external:Memory usage of C + + objects bound to JavaScript objects managed by V8.

Find leak

Chrome devtools is a great tool for remote debugging to diagnose memory leaks in the node.js program. There are other tools that provide you with similar functionality. However, you need to remember that profiling is a heavy CPU task and may have a negative impact on your program. Be sure to pay attention to this!

The node.js program we will introduce is a simple HTTP API server, which has multiple endpoints and returns different information to people who use the service. You can clone the repository of this program.

const http = require('http')

const leak = []

function requestListener(req, res) {

  if (req.url === '/now') {
    let resp = JSON.stringify({ now: new Date() })
    leak.push(JSON.parse(resp))
    res.writeHead(200, { 'Content-Type': 'application/json' })
    res.write(resp)
    res.end()
  } else if (req.url === '/getSushi') {
    function importantMath() {
      let endTime = Date.now() + (5 * 1000);
      while (Date.now() < endTime) {
        Math.random();
      }
    }

    function theSushiTable() {
      return new Promise(resolve => {
        resolve('🍣');
      });
    }

    async function getSushi() {
      let sushi = await theSushiTable();
      res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' })
      res.write(`Enjoy! ${sushi}`);
      res.end()
    }

    getSushi()
    importantMath()
  } else {
    res.end('Invalid request')
  }
}

const server = http.createServer(requestListener)
server.listen(process.env.PORT || 3000)

Start the node.js application:

Secret of node.js memory leak

We have been using the 3S (3 snapshot) method to diagnose and identify possible memory problems. Interestingly, we found that this is a long-term memory solution used by Loreena Lee of the Gmail team. Steps of this method:

  1. Open chrome devtools and accesschrome://inspect
  2. In the remote target at the bottom, clickinspectButton.

Secret of node.js memory leak

Be careful:Make sure that the inspector is attached to the node.js program you want to analyze. You can also usendbConnect to chrome devtools.

When the application is running, you will see a line in the output of the consoleDebugger ConnectedNews.

  1. Go to chrome devtools > memory
  2. Take a heap snapshot

Secret of node.js memory leak

In this case, we get the first snapshot without any load or processing on the service. This is a hint for some use cases: if we can be sure that we don’t need to warm up the program before accepting a request or processing something, that’s fine. Sometimes, it makes sense to warm up before taking the first heap snapshot, because in some cases, you may delay initialization of global variables on the first call.

  1. Perform what you think is causing a memory leak in your program.

In this case, we will runnpm run load-mem。 This will start.abTo simulate traffic or load in the node.js application.

Secret of node.js memory leak

  1. Get heap snapshot

Secret of node.js memory leak

  1. Do what you think will cause memory leaks in your program again.
  2. Take the final heap snapshot

Secret of node.js memory leak

  1. Select the most recent snapshot.
  2. At the top of the window, locate the drop-down list that shows “all objects” and switch it to “objects allocated between snapshots 1 and 2.”. (you can do the same for 2 and 3 if you need to.). This will greatly reduce the number of objects you see.

Secret of node.js memory leak

The comparison view can also help you identify those objects:

Secret of node.js memory leak

In this view, you will see a list of leaked objects: top-level entries (one row per constructor), the distance from the object to the GC root, the number of object instances, shallow size, and reserved size. You can view its contents by selecting a row. A good rule of thumb is to ignore the items in parentheses first because they are built-in structures.@The character is the object’s unique ID, allowing you to compare the heap snapshot of each object.

A typical memory leak may be by accidentally storing a reference to an object in a global object that cannot be garbage collected, thereby retaining a reference to an object that is expected to persist for only one request cycle.

This example intentionally leaves a problem of memory leakage, generating a random object with a date time stamp when requesting an object returned from an API query, and storing it in a global array to leak the object. By looking at a few reserved objects, you can see examples of leaking data that can be used to track leaks in an application.

Nsolid is a great fit for this type of use case because it gives you a good idea of how memory increases in each task or load test you perform. If you are curious, you can also see in real time how each performance analysis action affects the CPU.

Secret of node.js memory leak

In a real project, you can’t always stare at the tools used to monitor the program. One of the advantages of nsolid is that it can set thresholds and limits for different metrics of the application. For example, you can set nsolid to take a heap snapshot when the amount of memory used exceeds x, or when the memory has not been recovered from the high consumption peak within X time. Does that sound good?

Marking and cleaning

V8The garbage collector based onMark-SweepCollection algorithm, which includes tracking garbage collection, by tagging reachable objects, then cleaning up memory and reclaiming unmarked objects (which must be inaccessible) into the release list. This is also known as generational garbage collector, where objects can be moved in the new generation, from the new generation to the old generation, and the old generation.

Moving objects is expensive because you need to copy the underlying memory of the objects to a new location, and the pointers to those objects need to be updated.

Explain in words:

V8 recursively finds all object reference paths to the root node. For example, in JavaScript, a “window” object is an example of a global variable that can act as a root. The window object always exists, so the garbage collector can assume that it and all its children always exist (that is, not garbage). If there are any references, there is no path to the root node. In particular, when it recursively looks up unreferenced objects, it is marked as garbage and later cleared to free the memory and return it to the operating system.

However, modern garbage collectors have improved this algorithm in different ways, but the essence is the same: accessible memory is marked as one class, and the rest is regarded as garbage.

Keep in mind that everything accessible from the root is not considered garbage. Unnecessary references are the variables that are kept in a certain place in the code. These variables will no longer be used and point to the memory that can be released. Therefore, to understand the most common leaks in JavaScript, we need to understand the way that we usually forget to reference.

Orinoco garbage collector

Orinoco is the code name of the latest GC project. It uses the latest incremental and concurrent technologies for garbage collection and has the function of releasing the main thread. One of the important performance indicators of Orinoco is the frequency and time of the main thread pause when the garbage collector executes. For classic doomsday collectors, these time intervals affect the user experience of the program due to latency, poor rendering quality, and increased response time.

V8 allocates garbage collection work (cleanup) between secondary streams in the new generation memory. Each stream receives a set of pointers and moves all active objects to the“to-space”

When moving an object to “to space”, threads need to synchronize through atomic operations of reading, writing, comparing, and swapping to avoid another thread finding the same object but following a different path and trying to move it.

Quoted from V8 official website:

Adding parallel, incremental, and concurrent technologies to existing GC is a multi-year effort, but it has paid off, turning a lot of work over to background tasks. It greatly improves pause time, latency, and page loading, making animation, scrolling, and user interaction smoother. The parallel scavenger reduces the total time of the new generation garbage collection of the main thread by about 20% – 50% based on the workload. The idle time GC can reduce the JavaScript heap memory of Gmail by 45% when it is idle. Concurrent tagging and purging can reduce the pause time in cumbersome webgl games by up to 50%.

The mark evaluate collector consists of three phases: mark, copy, and update pointers. In order to avoid cleaning up pages in the new generation to maintain free lists, semi space is still used to maintain the new generation, which is always kept in a compact state, that is, copying active objects to “to space” during garbage collection. The advantage of parallel operation is that“exact liveness”Information. This information can be used to avoid duplication by moving and relinking only pages that primarily contain active objects, which can also be performed by the full mark sweep compact collector. It works by tagging the active objects in the heap in the same way as the tag cleanup algorithm, which means that the heap is usually fragmented. V8 currently comes with parallel scavenger, which can be used in a large number of benchmark testsReduce the total time of the new generation garbage collection of the main thread by about 20% – 50%

All aspects related to pausing the main thread, response time and page loading have been significantly improved, which makes the animation, scrolling and user interaction on the page smoother. Depending on the load, a parallel collector can reduce the total processing time of new memory by 20 – 50%. But the work is not over: reducing the pause is still an important task, and we will continue to look for the possibility of using more advanced technology to achieve this goal.

summary

Most developers don’t need to think about GC when developing JavaScript programs, but knowing something inside can help you think about memory usage and useful programming patterns. For example, considering the generation based heap structure in V8, from the perspective of GC, the cost of maintaining low lifetime objects is actually quite low, because we mainly pay for the existing objects. This pattern is not only specific to JavaScript, but also works for many languages that support garbage collection.

Main points:

  • Do not use outdated or deprecated packages (for example, node memwatch, node inspector, or V8 Profiler) to check memory. Everything you need is integrated into the node. JS binary (especially the node. JS checker and debugger). If you need more professional tools, you can use nsolid, chrome devtools or other well-known software.
  • Consider when and where to trigger heap snapshots and CPU profiles. Since you want to take a snapshot in a production environment, you will want to trigger both (mainly in testing), so it will require a lot of CPU operation. Also, before shutting down the process and performing a cold restart, verify how many heap dumps have been written.
  • No single tool can solve all problems. Test, measure, judge and solve according to the specific situation of the program. Choose the best tool for your architecture, and choose a tool that provides more useful data to help you solve problems.

This article starts with WeChat official account: front-end pioneer.

Welcome to scan the two-dimensional code to pay attention to the official account, and push you every day to send fresh front-end technical articles.

Secret of node.js memory leak

Welcome to other great articles in this column:

  • Deep understanding of shadow DOM v1
  • Step by step to teach you how to use webvr to realize virtual reality games
  • 13 modern CSS frameworks to improve your development efficiency
  • Get started bootstrap Vue
  • How does the JavaScript engine work? Everything you need to know from call stack to promise
  • Websocket practice: real time communication between node and react
  • 20 interview questions about Git
  • In depth analysis of console.log of node.js
  • What is node.js?
  • Build an API server with node.js in 30 minutes
  • Object copy of JavaScript
  • Programmers can’t earn 30K a month before they are 30 years old. Where to go
  • 14 best JavaScript data visualization Libraries
  • 8 top level vs code extensions for the front end
  • Node.js multithreading full guide
  • Four schemes and implementation of transforming HTML into PDF

  • More articles