Tomcat high concurrency and performance tuning

Time:2020-9-24

From the perspective of God, the architecture design of Tomcat is disassembled, and after understanding the whole design idea of components. We need to get down to earth to understand the detailed implementation of each component. From far to near, the structure gives people macro thinking, and the details show full beauty. Focus on “code byte” for more hard cores. Are you ready?

The last time “Codex byte” stood in the God’s view, we disassembled the Tomcat architecture design, analyzed how Tomcat can start and stop, and completed a request acceptance and response by designing two components of connection pool and container.The connector is responsible for external communication, processing socket connection, container is responsible for internal, loading servlet and handling specific request request and response。 Details point I enter the transmission gate: Tomcat architecture analysis to work reference.

Core preparation for high concurrency disassembly

This time, it’s disassembled again,Focus on Tomcat high concurrent design and performance tuning, let us have a higher level of understanding and understanding of the whole architecture. Each component idea of design is to abstract different components according to the actual needs, how to design classes to achieve a single responsibility, how to achieve high cohesion and low coupling of similar functions, and to apply design models to the ultimate learning reference.

This time, it mainly involves the I / O model and the basic content of thread pool.

Before learning, I hope you can accumulate the following technical contents. Many contents of “code byte” have been shared in historical articles. You can go upstairs and review.I hope you will pay attention to the following knowledge points. If you master the following knowledge points and then disassemble tomcat, you will get twice the result with half the effort. Otherwise, it is easy to get lost and have no way out

Let’s look at how Tomcat implements concurrent connection processing and task processing. Performance optimization is that each component plays a corresponding role. How to use the least memory and execute at the fastest speed is our goal.

Design patterns

Template method pattern: abstract algorithm process encapsulates the change and invariance point in the process in the abstract class. The change point is delayed to subclass implementation to achieve code reuse and open close principle.

Observer mode: according to the demand scenario that different components of events have different response mechanisms, it achieves decoupling and flexible notification to downstream.

Chain of responsibility pattern: connecting objects into a chain along which requests are passed. Valve in Tomcat is the application of this design pattern.

For more design patterns, check out the design patterns album before code byte. This is the portal.

I/o model

It is necessary to understand the concepts of synchronous blocking, asynchronous blocking, I / O multiplexing, asynchronous non blocking and the application of Java NiO package. This article will also focus on how I / O is used in Tomcat to achieve high concurrency connections. Through this article, I believe that I / O model will also have a profound understanding.

Java Concurrent Programming

To achieve high concurrency, in addition to the overall elegant design of each component, the rational design pattern, and the use of I / O, it also needs thread model and efficient concurrent programming skills. In the process of high concurrency, it is inevitable for multiple threads to access shared variables, which needs to be locked. How to effectively reduce lock conflicts. Therefore, as a programmer, we should consciously avoid the use of locks. For example, we can use atomic class CAS or concurrent collection instead. If you have to use a lock as a last resort, try to reduce the scope and strength of the lock.

As for the basic knowledge related to concurrency, if readers are interested in “code byte”, we will arrange it for you. At present, we have also written some concurrent albums. We can move to historical articles or albums. Here is the portal, which mainly explains the principle of concurrent implementation, what is memory visibility, JMM memory model, read-write lock and other concurrent knowledge points.

Overall framework of Tomcat

Once again, the overall architecture design of Tomcat is reviewedConnectorTCP / IP connection processing,Container containerAs a servlet container, it processes specific business requests. Two components are abstracted from the outside and inside to realize the extension.

  • A Tomcat instance has a service by default, and a service can contain multiple connectors. The connector mainly includes protocalhandler and adapter to complete the core function of connector.
  • ProtocolHandlerMainly byAcceptoras well asSocketProcessorThe structure of the socket of tcp/ip layer is read and converted intoTomcatRequestandTomcatResponseFinally, according to the HTTP or AJP protocol to obtain the appropriateProcessorIt resolves to application layer protocol, and transforms tomcatrequest and tomcatresponse into standard ServletRequest and servletresponse through adapter. adoptgetAdapter().service(request, response);Pass the request to the container container.
  • adapter.service () to forward the request to the containerorg.apache.catalina.connector.CoyoteAdapter
// Calling the container
connector.getService().getContainer().getPipeline().getFirst().invoke(
                        request, response);

This call will trigger the responsibility chain mode composed of getpipeline to step the request into the container step by step. Each container has a pipeline. It starts from first to the end of basic and enters the subclass container held inside the container. Finally, it reaches servlet. Here is the classic application of the responsibility chain mode. The specific source component is pipeline to form a request chain, each chain point is composed of valve. The analysis of “Codex byte” in the previous Tomcat architecture has been explained in detail. As shown in the following figure, the important components of the whole Tomcat architecture design are clearly visible. We hope that you can deeply print the global architecture map in your mind, and master the global thinking to better analyze the beauty of details.

Tomcat high concurrency and performance tuning

Start up process: startup.sh What happened to the script

Tomcat high concurrency and performance tuning

  • Tomcat is a java program, so startup.sh The script is to start a JVM to run Tomcat’s bootstrap class.
  • Bootstrap is mainly to instantiate Catalina and initialize Tomcat custom class loader. Hot loading and hot deployment are realized by him.
  • Catalina: parsing server.xml Create the server component and call Server.start () method.
  • Server: manage the service component and call the start() method of the server.
  • Service: the main responsibility is to manage the top-level container engine of the profiler, and call theConnectorandEngineOfstartmethod.

The engine container is mainly composed mode, which associates the containers according to the parent-child relationship, and the container inherits the lifecycle to realize the initialization and startup of each container. Lifecycle definesinit()、start()、stop()Control the life cycle of the entire container component to realize one key start and stop.

This is an interface oriented, single responsibility design idea, container uses the combination mode to manage containers. Lifecycle abstract class inherits lifecycle to manage the life cycle of each major container. Here is the process of initialization and startup. Lifecycle base uses template design mode to abstract the points of component changes and invariance, and delays the initialization of different components to specific subclasses. And the observer mode is used to release the start event decoupling.

The specific init and start processes are shown in the following swimlane diagram: This is my notes on reading source code debug, readersDon’t be afraid that taking notes will take a long time. I believe that you will have a deeper understanding by following debug

Init process

Tomcat high concurrency and performance tuning

Start process

Tomcat high concurrency and performance tuning

Reader friends according to my two articles, grasp the main line components to debug, and then follow the swimlane diagram to read the source code. I believe that we will get some results and get twice the result with half the effort. In the process of reading the source code, do not enter into some details. You must first abstract each component and understand the responsibilities of each component. Finally, after understanding the responsibilities and design philosophy of each component, we can deeply understand the implementation details of each component. We must not think about a specific leaf at the beginning.

I have identified each core class in the architecture design diagram and swimlane diagram, and “code byte” will share with you the experience of how to read the source code efficiently and keep learning interest.

How to read the source code correctly

Don’t get into details, not looking at the overall situation:I didn’t know what the forest looked like, so I stared at the leavesWe can’t see the whole picture and the overall design idea. Therefore, when reading the source code, do not enter the details at the beginning, but look at the overall architecture design ideas and the relationship between modules.

1. Before reading the source code, you need to have a certain technical reserve

For example, common design patterns must be mastered, especially:Template method, policy pattern, singleton, factory, observer, dynamic agent, adapter, chain of responsibility, decorator.You can read the historical article on design patterns in code byte to build a good foundation.

2. Must be able to use this framework / class library and be proficient in various flexible uses

The devil is in the details. If you don’t know some usage at all, maybe you can see what the code means, but you don’t know why it is written.

3. Look for books and materials to understand the overall design of the software.

From the overall perspective, God’s perspective to sort out the main core architecture design, forest first, then leaves. What are the modules? How are modules related? How is it related?

You may not be able to understand it all at once, but build a whole concept, like a map, to prevent you from getting lost.

When reading the source code, you can see where you are from time to time. Just like “code brother byte” has sorted out the relevant architecture design of Tomcat for you, and then try to follow debug by yourself. This kind of efficiency is even more powerful.

4. Build the system and run the source code!

Debug is a very important means. You want to make the system clear by just looking at it and not running. That is impossible! Use the call stack reasonably (observe the context of the calling procedure).

5. Notes

A very important job is to take notes (and write again!) , draw the class diagram of the system (do not rely on ide to generate it for you), and record the main function calls, so as to facilitate the follow-up view.

Document work is very importantBecause of the complexity of the code and the limited capacity of the human brain, we can’t remember all the details. Documents can help you remember the key points, think back and quickly move on.

Otherwise, what you read today may be forgotten by tomorrow. So my friends remember to read more after collection, try to download the source code and debug repeatedly.

Wrong way

  • Get into the details and ignore the overall situation:Before I knew what the forest looked like, I looked at the leavesWe can’t see the whole picture and the overall design idea. Therefore, when reading the source code, do not enter the details at the beginning, but look at the overall architecture design ideas and the relationship between modules.
  • Research on how to design before learning how to use it: first of all, the framework basically uses design patterns. At the very least, we should understand the commonly used design patterns. Even if it is “back”, we must know them clearly. In learning a technology, I recommend reading the official documents first to see what modules and overall design ideas are. Then download the sample run again, and finally see the source code.
  • Look at the source code and study the details: when you look at the source code of a specific module, you should not go into the details subconsciously. It is important to learn the design ideas, not a specific method to realize logic. Unless you want to do secondary development based on source code, and the secondary development is based on understanding the Zha architecture, you can go into the details.

Component design – implement single responsibility, interface oriented thinking

When we receive a function requirement, the most important thing is to abstract design. We will disassemble the main core components of the function, and then find the change and invariable point of the requirement. We will cohere similar functions. If the functions are coupled, we can expand the external support and close the modification internally. When we try to achieve a requirement, we need reasonable abstract ability to abstract different components, instead of mixing all the functions into one class or even a method in one pot. Such code moves the whole body, which is unable to expand and is difficult to maintain and read.

With questions, we will analyze how Tomcat designs components to complete connection and container management.

Let’s see how Tomcat starts Tomcat and how it accepts requests and forwards them to our servlet.

Catalina

The main task is to create the server, not simply create, but parse server.xml The file creates the meaning of each component of the file configuration, and then calls the init() and start() methods of the server. The start journey starts here At the same time, exceptions should also be taken into account. For example, closing Tomcat requires graceful closing. The resources created during the startup process need to be released. Tomcat registers a “close hook” in the JVM. I annotate the source code and omit some irrelevant code. At the same timeawait()Listen to stop command to close Tomcat.

/**
     * Start a new server instance.
     */
    public void start() {
                //If the server is empty, the server.xml  establish
        if (getServer() == null) {
            load();
        }
                //If the creation fails, it will report an error and exit the startup
        if (getServer() == null) {
            log.fatal("Cannot start server. Server instance is not configured.");
            return;
        }

        //Start the server
        try {
            getServer().start();
        } catch (LifecycleException e) {
            log.fatal(sm.getString("catalina.serverStartFail"), e);
            try {
                //Otherwise, destroy the resource
                getServer().destroy();
            } catch (LifecycleException e1) {
                log.debug("destroy() failed for failed Server ", e1);
            }
            return;
        }

        //Create and register a JVM close hook
        if (useShutdownHook) {
            if (shutdownHook == null) {
                shutdownHook = new CatalinaShutdownHook();
            }
            Runtime.getRuntime().addShutdownHook(shutdownHook);
        }
                //Listen for stop requests through the await method
        if (await) {
            await();
            stop();
        }
    }

Through the “close hook”, you can do some cleaning when the JVM is shut down, such as releasing thread pool, cleaning some zero time files, and refreshing the memory data to the disk …

The “close hook” is essentially a thread that the JVM attempts to execute before it stops. Let’s take a look at what the Catalina shutdown hook does.

/**
     * Shutdown hook which will perform a clean shutdown of Catalina if needed.
     */
    protected class CatalinaShutdownHook extends Thread {

        @Override
        public void run() {
            try {
                if (getServer() != null) {
                    Catalina.this.stop();
                }
            } catch (Throwable ex) {
               ...
        }
    }

    /**
     *Close the server instance that has been created
     */
    public void stop() {

        try {
            // Remove the ShutdownHook first so that server.stop()
            // doesn't get invoked twice
            if (useShutdownHook) {
                Runtime.getRuntime().removeShutdownHook(shutdownHook);
            }
        } catch (Throwable t) {
            ......
        }

        //Shut down the server
        try {
            Server s = getServer();
            LifecycleState state = s.getState();
           //Determine whether it has been closed. If it is closed, no operation is performed
            if (LifecycleState.STOPPING_PREP.compareTo(state) <= 0
                    && LifecycleState.DESTROYED.compareTo(state) >= 0) {
                // Nothing to do. stop() was already called
            } else {
                s.stop();
                s.destroy();
            }
        } catch (LifecycleException e) {
            log.error("Catalina.stop", e);
        }

    }

In fact, the stop method of server is executed, and the stop method of server releases and cleans up all resources.

Server components

To experience the beauty of interface design below, see how Tomcat designs components and interfaces, abstract server components, server components need life cycle management, so inherit lifecycle to realize one click start stop.

Its specific implementation class is standard server, as shown in the figure below. We know that the main methods of lifecycle are component initialization, start, stop, destruction, and listener management and maintenance. In fact, it is the design of observer mode. When different events are triggered, events are published to the listener to perform different business processing. Here is the embodiment of design philosophy of how to decouple.

Server is responsible for managing service components.

Tomcat high concurrency and performance tuning

Next, we will see what the specific implementation class of server component has and which classes are associated with?

Tomcat high concurrency and performance tuning

In the process of reading source code, we must pay more attention to the interface and abstract class, which is the abstraction of the global design of components; and abstract class is basically the application of template method mode, the main purpose is to abstract the whole algorithm process, and to transfer the change point to the subclass, and reuse the invariant point with modern code.

Standard server inherits lifecyclebase, its life cycle is managed in a unified way, and its sub component is service. Therefore, it also needs to manage the life cycle of service, that is to call the start method of service components when starting, and call their stop methods when stopping. The server maintains a number of service components internally, which are saved in an array. How does the server add a service to the array?

/**
     *Add service to the defined array
     *
     * @param service The Service to be added
     */
    @Override
    public void addService(Service service) {

        service.setServer(this);

        synchronized (servicesLock) {
           //Create a services.length  +Results array of length 1
            Service results[] = new Service[services.length + 1];
           //Copy old data to results array
            System.arraycopy(services, 0, results, 0, services.length);
            results[services.length] = service;
            services = results;
                        //Start the service component
            if (getState().isAvailable()) {
                try {
                    service.start();
                } catch (LifecycleException e) {
                    // Ignore
                }
            }

            //Observer mode is used to trigger listening events
            support.firePropertyChange("service", null, service);
        }

    }

In order to save space, we need to know that the most important thing in the development process is not to increase the length of the array, that is to say, we should not increase the length of the code in order to save space.

In addition, there is an important function. The last line of caralina’s startup method calls the await method of the server.

The main method is to stop listening on the portawaitMethod will create a socket to listen to port 8005 and receive connection requests from socket in an endless loop. If a new connection comes, it will establish a connection and then read data from the socket. If the data read is the stop command “shutdown”, it will exit the loop and enter the stop process.

Service

It is also an interface oriented design. The specific implementation class of the service component is standardservice, and the service component still inherits the lifecycle management life cycle. There is no need to show the picture diagram here. Let’s take a look at the main methods and member variables defined by the service interface. Only through the interface can we know the core functions. When reading the source code, we must pay more attention to the relationship between each interface, and do not rush into the implementation class.

public interface Service extends Lifecycle {

  //--- primary member variables

    //The top level container engine contained by the service component
    public Engine getContainer();

    //Set the engine container for the service
    public void setContainer(Engine engine);

    //The server component to which the service belongs
    public Server getServer();

    // --------------------------------------------------------- Public Methods

   //Add service associated connector
    public void addConnector(Connector connector);

    public Connector[] findConnectors();

   //Custom thread pool
    public void addExecutor(Executor ex);

   //The main function of mapper is to locate the service according to the URL. The main function of mapper is to locate the component processing where a request is located
    Mapper getMapper();
}

Next, take a closer look at the implementation class of service

public class StandardService extends LifecycleBase implements Service {
    //Name
    private String name = null;

    //Server instance
    private Server server = null;

    //Connector array
    protected Connector connectors[] = new Connector[0];
    private final Object connectorsLock = new Object();

    //Corresponding engine container
    private Engine engine = null;

    //Mapper and its monitor are also the application of observer pattern
    protected final Mapper mapper = new Mapper();
    protected final MapperListener mapperListener = new MapperListener(this);
}

Standardservice inherits the lifecycle base abstract class. The abstract class defines three final template methods to define the life cycle. Each method defines the abstract method at the change point, so that different components can have their own process. Here is also where we learn, using the template method Abstract change and unchanged.

In addition, there are some familiar components in standard service, such as server, connector, engine and mapper.

So why is there a mapperlistener? This is because Tomcat supports hot deployment. When the deployment of web applications changes, the mapping information in mapper will also change. Mapperlistener is a listener, which monitors the changes of containers and updates the information to mapper. This is a typical observer mode. The downstream service makes different processing according to the actions of multiple upstream services, which is the application scenario of observer mode. It realizes that one event is triggered by multiple listeners. Instead of calling all downstream services, the event publisher achieves decoupling by triggering the observer pattern.

Service manages the connector and the top-level container of engine, so it continues to enter its startinternal method, which is actually an abstract method defined by the lifecycle base template. See how he starts each component sequence.

protected void startInternal() throws LifecycleException {

    //1. Trigger to start the monitor
    setState(LifecycleState.STARTING);

    //2. Start the engine first, and the engine will start its sub container. Because the combination mode is used, each layer of container will start its own sub container first.
    if (engine != null) {
        synchronized (engine) {
            engine.start();
        }
    }

    //3. Restart mapper listener
    mapperListener.start();

    //4. Finally, start the connector. The connector will start its sub components, such as endpoint
    synchronized (connectorsLock) {
        for (Connector connector: connectors) {
            if (connector.getState() != LifecycleState.FAILED) {
                connector.start();
            }
        }
    }
}

Service starts the engine component first, then mapper listener, and finally starts the connector. This is easy to understand, because only when the inner component is started can it provide services to the outside world and start the outer connector component. Mapper also relies on container components, which can only listen for their changes after the container components are started. Therefore, mapper and mapperlistener are started after the container components. Components stop in the opposite order as they start, and are based on their dependencies.

Engine

As the top-level component of container, engine is essentially a container, which inherits containerbase. We can see that the template method is applied to the design pattern of abstract class again. Containerbase uses aHashMap<String, Container> children = new HashMap<>();Member variables hold the child containers of each component. At the same timeprotected final Pipeline pipeline = new StandardPipeline(this);Pipeline consists of a pipeline to handle the requests from the connector, and the responsibility chain pattern builds the pipeline.

 public class StandardEngine extends ContainerBase implements Engine {
 }

The child container of engine is host, so children save host.

Let’s see what containerbase did

  • Initinternal defines container initialization and creates a thread pool dedicated to starting and stopping containers.
  • Startinternal: the container starts the default implementation. The parent-child relationship of the container is built through the combination mode. First, get its own child container, and use the startstopexecutor to start the child container.
public abstract class ContainerBase extends LifecycleMBeanBase
        implements Container {

   //Provides default initialization logic
    @Override
    protected void initInternal() throws LifecycleException {
        BlockingQueue<Runnable> startStopQueue = new LinkedBlockingQueue<>();
       //Create thread pool to start or stop container
        startStopExecutor = new ThreadPoolExecutor(
                getStartStopThreadsInternal(),
                getStartStopThreadsInternal(), 10, TimeUnit.SECONDS,
                startStopQueue,
                new StartStopThreadFactory(getName() + "-startStop-"));
        startStopExecutor.allowCoreThreadTimeOut(true);
        super.initInternal();
    }

  //Container startup
    @Override
    protected synchronized void startInternal() throws LifecycleException {

        //Get the child container and submit it to the thread pool to start
        Container children[] = findChildren();
        List<Future<Void>> results = new ArrayList<>();
        for (Container child : children) {
            results.add(startStopExecutor.submit(new StartChild(child)));
        }
        MultiThrowable multiThrowable = null;
        //Get startup results
        for (Future<Void> result : results) {
            try {
                result.get();
            } catch (Throwable e) {
                log.error(sm.getString("containerBase.threadedStartFailed"), e);
                if (multiThrowable == null) {
                    multiThrowable = new MultiThrowable();
                }
                multiThrowable.add(e);
            }

        }
       ......

        //Start the pipeline pipe to process requests passed by the connector
        if (pipeline instanceof Lifecycle) {
            ((Lifecycle) pipeline).start();
        }
                 //Publish start event
        setState(LifecycleState.STARTING);
        // Start our thread
        threadStart();
    }


}

It inherits the lifecycle mbeanbase, that is to say, it also implements the lifecycle management, provides the default startup mode of the sub container, and provides the crud function for the sub container.

When the engine starts the host container, it uses the startinternal method of containerbase. What else did engine do?

Let’s look at the construction method. Pipeline sets setbasic and creates standard engine valve.

/**
     * Create a new StandardEngine component with the default basic Valve.
     */
    public StandardEngine() {

        super();
        pipeline.setBasic(new StandardEngineValve());
        .....

    }

The main function of the container is to process the request and forward the request to a host sub container for processing, which is realized by valve. Each container component has a pipeline to make up a responsibility chain to pass requests. The basic valve is defined as follows in pipeline

final class StandardEngineValve extends ValveBase {
    @Override
    public final void invoke(Request request, Response response)
        throws IOException, ServletException {

        //Select a suitable host to process the request and get the appropriate host through mapper component
        Host host = request.getHost();
        if (host == null) {
            response.sendError
                (HttpServletResponse.SC_BAD_REQUEST,
                 sm.getString("standardEngine.noHost",
                              request.getServerName()));
            return;
        }
        if (request.isAsyncSupported()) {
            request.setAsyncSupported(host.getPipeline().isAsyncSupported());
        }

        //Get the pipeline first valve of the host container and forward the request to the host
        host.getPipeline().getFirst().invoke(request, response);
}

The basic valve implementation is very simple, which forwards the request to the host container. The host container object processing the request is obtained from the request. How can there be a host container in the request object? This is because before the request arrives in the engine container, mapper component has already processed the request route. Mapper component locates the corresponding container through the request URL and saves the container object to the request object.

Component design summary

Have you found that the design of Tomcat is almost interface oriented design, that is, the design of function isolation through the interface is actually the embodiment of a single responsibility. Each interface abstracts different components of objects, and defines the common execution process of components through abstract classes. The meaning of the four words “single responsibility” is actually reflected here. In the process of analysis, we see the design philosophy of observer pattern, template method pattern, composition pattern, responsibility chain pattern and how to abstract component oriented interface design.

I / O model of connector and design of thread pool

The main function of the connector is to accept TCP / IP connections, limit the number of connections, read data, and finally forward the request toContainerContainer. So I / O programming must be involved here. Today, I’d like to take you to analyze how Tomcat uses the I / O model to achieve high concurrency and enter the I / O world together.

There are five main I / O modelsSynchronous blocking, synchronous non blocking, I / O multiplexing, signal driven, asynchronous I / O。 Are you familiar with them, but are you stupid and can’t tell the difference between them?

The so-calledI/o is the process of copying data between computer memory and external devices

The CPU reads the data of the external device into the memory first, and then processes it. Please consider this scenario. When a program sends a read instruction to an external device through the CPU, it usually takes a period of time for data to be copied from the external device to the memory. At this time, the CPU has nothing to do, and the program actively gives up the CPU to others? Or let the CPU keep checking: data arrived, data arrived

This is the problem that i/o model needs to solve. Today I will talk about the differences between various i/o models, and then focus on how the nioendpoint component of Tomcat implements the non blocking i/o model.

I / O model

A network I / O communication process, such as network data reading, involves two objects, namely, the user thread calling the I / O operation and the operating system kernel. The address space of a process is divided into user space and kernel space. User threads cannot directly access kernel space.

There are two steps in network reading

  • The user thread waits for the kernel to copy the data from the NIC to the kernel space.
  • The kernel copies data from kernel space to user space.

Similarly, sending data to the network is the same process, copying data from user threads to kernel space, and kernel space copying data to network card.

The difference between different I / O models: the two steps are implemented in different ways.

  • For synchronization, it refers to whether the application calls a method and returns immediately without waiting.
  • For blocking and non blocking: it mainly refers to whether the read and write operations of data copied from the kernel to user space are blocked and waiting.

Synchronous blocking I / O

Initiated by user threadreadDuring the call, the thread is blocked and can only let out the CPU. The kernel waits for the data of the NIC to arrive and copies the data from the NIC to the kernel space. When the kernel copies the data to the user space and wakes up the user thread that has just been blocked, the threads in both steps are blocked.

Tomcat high concurrency and performance tuning

Synchronous non blocking

The user thread keeps callingreadMethod, if the data has not been copied to kernel space, it returns failure until the data reaches kernel space. The user thread is always blocked while waiting for the data to be copied from kernel space to user space, and it is not wakened until the data reaches user space. Loop callreadMethod is not blocked.

Tomcat high concurrency and performance tuning

I / O multiplexing

The read operation of user thread is divided into two steps

  1. User thread initiated firstselectCall, mainly to ask if the kernel data is ready? When the kernel has the data ready, the second step is performed.
  2. User thread restartreadCall, the time waiting for the kernel to copy data from kernel space to user space is blocked to launch read thread.

Why I / O multiplexing, the core is: onceselectCall can query the kernel for more than oneData channelThis is called multiplexing.

Tomcat high concurrency and performance tuning

Asynchronous I / O

When the user thread executes the read call, it will register a callback function. The read call will return immediately and will not block the thread. After waiting for the kernel to prepare the data, it will call the newly registered callback function to process the data. During the whole process, the user thread has not been blocked.

Tomcat high concurrency and performance tuning

Tomcat NioEndpoint

The nioendpoit component of Tomcat actually implements the I / O multiplexing model. This concurrency capability is excellent enough. Let’s take a look at the design principles of Tomcat nioendpoint.

For the use of Java multiplexer, there are no more than two steps:

  1. Create a Seletor, register all kinds of interesting events on it, then call the select method, waiting for something interesting to happen.
  2. When something of interest occurs, such as when it can be read, a new thread is created to read data from the channel.

Although the implementation of nioendpoint component of Tomcat is complex, the basic principle is the above two steps. Let’s first see what components it has. It contains five components: limitlatch, acceptor, poller, socketprocessor and executor. Their working processes are shown in the following figure:

Tomcat high concurrency and performance tuning

Because of the I / O multiplexing, poller’s internal essence is to hold the Java selector to detect the I / O time of the channel. When the data is read and written, the task of creating socketprocessor is lost to the thread pool for execution, that is, a small number of threads listen to the read and write events, and then the exclusive thread pool performs the read and write, so as to improve the performance.

Custom thread pool model

In order to improve the processing capacity and concurrency, web containers usually put the processing of requests in the thread pool. Tomcat extends the Java Native thread pool to improve concurrency requirements. Before entering the Tomcat thread pool principle, let’s review the java thread pool principle.

Java thread pool

In short, a thread array and a task queue are maintained in the java thread pool. When the task cannot be processed, it will be put into the queue and processed slowly.

ThreadPoolExecutor

We need to understand the function of each parameter to understand the working principle of thread pool.

    public ThreadPoolExecutor(int corePoolSize,
                              int maximumPoolSize,
                              long keepAliveTime,
                              TimeUnit unit,
                              BlockingQueue<Runnable> workQueue,
                              ThreadFactory threadFactory,
                              RejectedExecutionHandler handler) {
        ......
    }
  • Corepoolsize: the number of threads that are reserved in the pool, even if they are idle, will not be closed unless allowcoretreadtimeout is set.
  • Maximumpoolsize: the maximum number of threads allowed in the pool when the queue is full.
  • Keepalivetime, timeunit: if the number of threads is greater than the number of cores, the longest remaining idle threads will be destroyed. Unit is the time unit of the keepalivetime parameter. When settingallowCoreThreadTimeOut(true)When the thread idle time in the range of corepoolsize in the thread pool reaches keepalivetime, it will also be recycled.
  • Workqueue: when the number of threads reachescorePoolSizeAfter that, the new tasks are placed in the work queueworkQueueAnd the threads in the thread pool are trying toworkQueueIn other words, the poll method is called to get the task.
  • Threadfactory: the factory that creates the thread, such as setting whether it is a background thread, thread name, etc.
  • Rejectedexecutionhandler: the reject policy is executed by the handler because the thread limit and queue capacity are reached. You can also customize the rejection policy as long as you implement itRejectedExecutionHandlerThat’s fine. Default deny policy:AbortPolicyReject the task and throw itRejectedExecutionExceptionAbnormal;CallerRunsPolicyThe thread that submits the task is executed“

To analyze the relationship between each parameter:

When submitting a new task, if the number of thread pools is less than corepoolsize, a new thread pool will be created to execute the task. When the number of threads = corepoolsize, the new task will be put into the work queue workqueue, and the threads in the thread pool will try to get tasks from the queue for execution.

If there are many tasks, the workqueue is full, and the current number of threads is < maximumpoolsize, the thread is temporarily created to execute the task. If the total number of threads exceeds the maximumpoolsize, the thread is no longer created, but the rejection policy is executed.DiscardPolicyDo nothing and discard the task directly;DiscardOldestPolicyDiscard the oldest unprocessed program;

The specific implementation process is shown in the following figure:

Tomcat high concurrency and performance tuning

Tomcat thread pool

The custom version of ThreadPoolExecutor inherits java.util.concurrent .ThreadPoolExecutor。 There are two key parameters for thread pool:

  • Number of threads.
  • Queue length.

Tomcat must be limited to two parameters, otherwise in the high concurrency scenario, it may lead to the risk of resource exhaustion of CPU and memory. Inherited and java.util.concurrent . ThreadPoolExecutor is the same, but the implementation is more efficient.

The construction method is as follows, which is the same as the official JAVA

public ThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime, TimeUnit unit, BlockingQueue<Runnable> workQueue, RejectedExecutionHandler handler) {
        super(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue, handler);
        prestartAllCoreThreads();
    }

The components that control the thread pool in Tomcat areStandardThreadExecutor, also implements the lifecycle interface. The following is the code to start the thread pool

@Override
    protected void startInternal() throws LifecycleException {
        //Custom task queue
        taskqueue = new TaskQueue(maxQueueSize);
        //Custom thread factory
        TaskThreadFactory tf = new TaskThreadFactory(namePrefix,daemon,getThreadPriority());
       //Creating a custom thread pool
        executor = new ThreadPoolExecutor(getMinSpareThreads(), getMaxThreads(), maxIdleTime, TimeUnit.MILLISECONDS,taskqueue, tf);
        executor.setThreadRenewalDelay(threadRenewalDelay);
        if (prestartminSpareThreads) {
            executor.prestartAllCoreThreads();
        }
        taskqueue.setParent(executor);
        //Observer mode, publish start event
        setState(LifecycleState.STARTING);
    }

The key points are as follows:

  1. Tomcat has its own customized task queue and thread factory, and can limit the length of the task queue. Its maximum length is maxqueuesize.
  2. Tomcat also has restrictions on the number of threads, setting the number of core threads (minsparethreads) and the maximum number of thread pools (maxthreads).

In addition, Tomcat redefines its own thread pool processing process on the basis of official original, and the native processing process has been mentioned above.

  • When the former corepoolsize task is in progress, a task creates a new thread.
  • If the queue is full, but the maximum number of thread pools is not reached, a temporary thread is created to extinguish the fire.
  • The number of thread buses reaches maximumpoolsize and the denial policy is executed directly.

Tomcat thread pool extends the native ThreadPoolExecutor, and implements its own task processing logic by rewriting execute method

  • When the previous corepoolsize tasks are in progress, a new thread is created for each task.
  • If the queue is full, but the maximum number of thread pools is not reached, a temporary thread is created to extinguish the fire.
  • Thread bus number reaches maximumpoolsize, continue to try to put the task in the queue. If the queue is full and the Insert task fails, the reject policy is executed.

The biggest difference is that Tomcat does not execute the reject policy immediately when the total number of threads reaches the maximum number, but tries to add tasks to the task queue again, and then executes the rejection policy after adding failure.

The code is as follows:

public void execute(Runnable command, long timeout, TimeUnit unit) {
       //Record number of submitted tasks + 1
        submittedCount.incrementAndGet();
        try {
            //Call Java Native thread pool to execute the task, when native throw reject strategy
            super.execute(command);
        } catch (RejectedExecutionException rx) {
          //The bus process reaches maximumpoolsize, and Java Native will execute the rejection policy
            if (super.getQueue() instanceof TaskQueue) {
                final TaskQueue queue = (TaskQueue)super.getQueue();
                try {
                    //Try to put the task in the queue
                    if (!queue.force(command, timeout, unit)) {
                        submittedCount.decrementAndGet();
                      //The queue is still full. If the insertion fails, the reject policy will be executed
                        throw new RejectedExecutionException("Queue capacity is full.");
                    }
                } catch (InterruptedException x) {
                    submittedCount.decrementAndGet();
                    throw new RejectedExecutionException(x);
                }
            } else {
              //Submit mission statement-1
                submittedCount.decrementAndGet();
                throw rx;
            }

        }
    }

Tomcat thread pool uses submitted count to maintain the submitted thread pool, which is related to the task queue of the customized version of Tomcat. Tomcat’s task queue taskqueue extends the linkedblockingqueue in Java. We know that the length of linkedblockingqueue is unlimited by default, unless it is given a capacity. Therefore, Tomcat gives it a capacity. The constructor of taskqueue has an integer parameter capacity. Taskqueue passes the capacity to the constructor of its parent class linkedblockingqueue to prevent memory overflow caused by unlimited addition of tasks. And the default is unlimited, which will cause the current number of threads to reach the number of core threads, and then the task will be added to the task queue by the thread pool, and it will always succeed. In this way, there will never be a chance to create a new thread.

In order to solve this problem, taskqueue rewrites the offer method of linkedblockingqueue. It returns false at an appropriate time, and returns false to indicate that the task addition failed. At this time, the thread pool will create a new thread.

public class TaskQueue extends LinkedBlockingQueue<Runnable> {

  ...
   @Override
  //When the thread pool calls the method of task queue, the current number of threads must be greater than the number of core threads
  public boolean offer(Runnable o) {

      //If the number of threads has reached the maximum value, you cannot create a new thread. You can only add the task to the task queue.
      if (parent.getPoolSize() == parent.getMaximumPoolSize())
          return super.offer(o);

      //Execution here indicates that the current number of threads is greater than the number of core threads and less than the maximum number of threads.
      //It indicates that new threads can be created. Do you want to create them? There are two cases

      //1. If the number of submitted tasks is less than the current number of threads, there are still idle threads, and there is no need to create a new thread
      if (parent.getSubmittedCount()<=(parent.getPoolSize()))
          return super.offer(o);

      //2. If the number of submitted tasks is greater than the current number of threads, and the threads are not enough, return false to create a new thread
      if (parent.getPoolSize()<parent.getMaximumPoolSize())
          return false;

      //By default, tasks are always added to the task queue
      return super.offer(o);
  }

}

Only when the current number of threads is greater than the number of core threads and less than the maximum number of threads, and the number of submitted tasks is greater than the number of current threads, that is to say, the number of threads is not enough, but the number of threads does not reach the limit, then a new thread will be created. That’s why Tomcat needs to maintain the variable number of submitted tasks. Its purpose is toGive the thread pool the opportunity to create new threads without limiting the length of the task queue。 You can limit the length of the task queue by setting the maxqueuesize parameter.

performance optimization

Thread pool tuning

Thread pool is closely related to i/o model. Thread pool tuning is to set reasonable thread pool parameters. Let’s start by looking at the key parameters in the Tomcat thread pool:

parameter details
threadPriority Thread priority, default is 5
daemon Whether it is a background thread, the default is true
namePrefix Thread name prefix
maxThreads Maximum number of threads, the default is 200
minSpareThreads Minimum number of threads (idle for more than a certain time will be recycled), the default is 25
maxIdleTime The maximum idle time of the thread, which is exceeded will be recycled until only minsparethreads are available. Default is 1 minute
maxQueueSize Maximum task queue length
prestartAllCoreThreads Whether minsparethreads are created when thread pool is started. The default is fasle

The core of this is how to determine the value of maxthreads. If this parameter is set too small, Tomcat will suffer from thread starvation, and the processing of requests will be queued in the queue, resulting in a longer response time. If the maxthreads parameter value is too large, there will also be problems, because the number of cores of the server’s CPU is limited, and too many threads will lead to threads on the CPU Up and back handoff consumes a lot of switching overhead.

Thread I / O time and CPU time

So far, we get a formula for the number of thread pools. Suppose the server is single core:

Thread pool size = (thread i/o blocking time + thread CPU time) / thread CPU time

Where: thread I / O blocking time + thread CPU time = average request processing time.

Analysis and optimization of memory overflow in Tomcat

The JVM is throwing java.lang.OutOfMemoryError In addition to a line of description information, the stack trace is also printed, so we can use this information to find the cause of the exception. Before looking for the cause, let’s take a look at the factors that can cause outofmemoryerror. Memory leakage is a common cause of outofmemoryerror.

In fact, tuning is always looking for system bottlenecks. If there is a situation: the system response is slow, but CPU utilization is not high and memory has increased. Through analysis of heap dump, we find that a large number of requests are stacked in the queue of online process pool. What should I do in this case? It may be that the request processing time is too long, to check whether access to the database or external application has encountered delay.

java.lang.OutOfMemoryError: Java heap space

When the JVM is unable to allocate objects in the heap, this exception will be thrown for the following reasons:

  1. Memory leak: the object that should be recycled has been holding references all the time, so the object cannot be recycled. For example, ThreadLocal, object pool and memory pool are used in thread pool. In order to find the memory leak point, we use jmap tool to generate the heap dump, and then use mat analysis to find the memory leak point.jmap -dump:live,format=b,file=filename.bin pid
  2. Out of memory: the heap size we set is not enough for the application. Modify the JVM parameters to adjust the heap size, such as – xms256m – xmx2048m.
  3. Overuse of the finalize method. If we want to execute some logic before the Java class instance is GC, such as cleaning up the resources held by the object, we can define the finalize method in the Java class, so that the JVM GC will not recycle these object instances immediately, but will add the object instances to a new instance called“ java.lang.ref . Finalizer.ReferenceQueue ”To execute the finalization of the object Method before the objects are recycled. The finalizer thread will compete with the main thread for CPU resources. However, due to its low priority, the processing speed can not keep up with the speed of creating objects by the main thread. Therefore, there are more and more objects in the ReferenceQueue queue, and outofmemoryerror will be thrown eventually. The solution is to try not to define a finalize method for a Java class.

java.lang.OutOfMemoryError: GC overhead limit exceeded

The garbage collector runs continuously, but it is inefficient and hardly recycles memory. For example, Java processes spend more than 96% of CPU time on GC, but the reclaimed memory is less than 3% of the JVM heap, and this is the case for five consecutive GCS, outofmemoryerror will be thrown.

The IDE solution to this problem is to check the GC log or generate a heap dump. First, confirm whether there is a memory overflow. If not, you can try to increase the heap size. You can print GC logs with the following JVM startup parameters:

- verbose:gc // Output GC on the console
-20: +printgcdetails // output detailed GC status on the console
-Xloggc: filepath // output GC log to the specified file

For example, it can be usedjava -verbose:gc -Xloggc:gc.log -XX:+PrintGCDetails -jar xxx.jarRecord the GC log, view the GC log through the gcviewer tool, and open the generated gc.log Analysis of garbage collection.

java.lang.OutOfMemoryError: Requested array size exceeds VM limit

This exception is thrown because “the requested array size exceeds the JVM limit” and the application attempts to allocate a very large array. For example, the program tries to allocate a 128M array, but the maximum heap size is 100m. Generally, this is also a configuration problem. It may be that the JVM heap is set too small, or it may be a bug in the program. Is it possible to create a super large array.

java.lang.OutOfMemoryError: MetaSpace

The memory of the JVM Metaspace is allocated in local memory, but its size is limited by the parameter maxmetaspacesize. When the meta space size exceeds maxmetaspacesize, the JVM will throw outofmemoryerror with the word Metaspace. The solution is to increase the value of the maxmetaspacesize parameter.

java.lang.OutOfMemoryError: Request size bytes for reason. Out of swap space

When the local heap memory allocation fails or the local memory is about to run out, the Java hotspot VM code throws this exception, and the VM triggers the fatal error handling mechanism, which generates a fatal error log file containing useful information about the thread, process, and operating system when the crash occurs. If you encounter this type of outofmemoryerror, you need to diagnose it based on the error information thrown by the JVM; or use the DTrace tool provided by the operating system to track system calls to see what kind of program code is constantly allocating local memory.

java.lang.OutOfMemoryError: Unable to create native threads

  1. The Java program requests the JVM to create a new java thread.
  2. The JVM native code proxies the request and creates an operating system level thread native thread by calling the operating system API.
  3. When the operating system tries to create a new native thread, it needs to allocate some memory to the thread at the same time. Each native thread has a thread stack. The size of the thread stack is determined by the JVM parameter-Xssdecision.
  4. For various reasons, the operating system may fail to create a new thread, which will be discussed in detail below.
  5. JVM throw“ java.lang.OutOfMemoryError : unable to create new native thread “error.

This is just an overview of the scene. For the production online troubleshooting, it will be launched in succession, and will not be launched due to space constraints.Pay attention to “code brother byte” and give you hard goods to gnaw!

summary

Review tomcat, summarize the architecture design, and disassemble how Tomcat handles high concurrency connection design in detail. And shared how to effectively read the open source framework source ideas, design patterns, concurrent programming foundation is the top priority, readers can read the history of “code byte” historical articles to learn.

Recommended reading

Analysis of Tomcat architecture for reference

Design patterns album

Concurrent programming

The core components of Tomcat are disassembled, and the design philosophy of how to design interface and implement single responsibility is realized. Then it summarizes the I / O models involved in the connector, and explains the different I / O models in detail. Next, we look at how Tomcat implements NiO, how to customize thread pool and queue to achieve high concurrency design. Finally, we simply share common oom The scenarios and solutions are limited to space and will not be expanded in detail. We will share various online troubleshooting and optimization ideas in the future …

Any questions can be explored or countedAdd personal wechat: magebyte 1024, learn and improve together.

You can also add technical groups through official account menu.Alibaba, TencentThe big man of.

Tomcat high concurrency and performance tuning

Writing articles is not easy. If you read it and feel useful, you will pay attention to the code brother bytes official account. Click on “share”, “point praise”, “look at” is the biggest encouragement.