Optimize spring boot in this way, and the startup speed is fast enough to fly!


It’s fun to use microservices for a while, but it’s not good to use them. Especially for the problems of service splitting, such as not controlling the business boundary well, splitting particles are too large, and some spring boot startup speeds are too slow. You may also have this experience. Here we will explore some aspects of spring boot startup speed optimization.

Start time analysis

Idea comes with an integrated async profile tool, so we can more intuitively see some problems in the startup process through the flame diagram. For example, in the example below, a lot of time is spent in bean loading and initialization through the flame diagram.

The figure comes from the integrated async profile tool of idea. You can search the custom configuration of Java Profiler in preferences and start using run with XX profiler.

The y-axis represents the call stack. Each layer is a function. The deeper the call stack is, the higher the flame is. The top is the executing function, and the bottom is its parent function.

The x-axis represents the number of samples. If a function occupies a wider width on the x-axis, it means that it has been drawn more times, that is, it takes longer to execute.

Start optimization

Reduce business initialization

Most of the time-consuming should be because the business is too large or contains a lot of initialization logic, such as establishing database connections, redis connections, various connection pools, etc. the suggestion for the business side is to minimize unnecessary dependencies, and if it can be asynchronous, it will be asynchronous.

Delay initialization

Introduced after spring boot version 2.2spring.main.lazy-initializationProperty, configured to true means that all beans will delay initialization.

The startup speed can be improved to some extent, but the first access may be slow.


Spring Context Indexer

Spring5 and later versions providespring-context-indexerThe main function is to avoid the problem of too slow scanning speed caused by too many classes during class scanning.

The use method is also very simple. Import the dependency, and then mark it on the startup class@IndexedAnnotation, which will be generated after the program is compiled and packagedMETA-INT/spring.componentsFile, when executedComponentScanWhen scanning the class, the index file will be read to improve the scanning speed.


Turn off JMX

JMX will be enabled by default in spring boot version 2.2.x and can be viewed using jconsole. If we don’t need these monitoring, we can turn it off manually.


Turn off hierarchical compilation

For versions after java8, multi-level compilation is turned on by default, and commands are usedjava -XX:+PrintFlagsFinal -version | grep CompileThresholdsee.

Tier3 is C1, and tier4 is C2, which means that a method is interpreted and compiled 2000 times for C1 compilation, and 15000 times after C1 compilation will be executed for C2 compilation.

We can use the C1 compiler by command, so there is no optimization phase of C2, which can improve the startup speed and cooperate with-Xverify:none/ -noverifyTurn off bytecode verification, but try not to use it in online environment.

-XX:TieredStopAtLevel=1 -noverify

Another idea

The above introduces some optimizations from the business level and startup parameters. Now let’s see what ways can be optimized based on Java applications.

Before that, let’s recall the process of creating objects in Java. First, we need to load classes, and then create objects. After the object is created, we can call object methods, which also involves JIT. JIT improves the performance of Java programs by compiling bytecode into local machine code at runtime.

Therefore, the following technologies will summarize the above steps.

JAR Index

Jar package is essentially a zip file. When loading classes, we traverse the jar package through the class loader, find the corresponding class file to load, and then verify, prepare, parse, initialize, and instantiate objects.

Jarindex is actually a very old technology, which is used to solve the performance problem of traversing jar when loading classes. It has been introduced as early as jdk1.3.

Suppose we want to find a class in the three jar packages a\b\c. if we can use the type com C. Immediately infer the specific jar package, and you can avoid the process of traversing jars.




Through the jar index technology, the corresponding index file index.list can be generated.

com/A --> A.jar
com/B --> B.jar
com/C --> C.jar

However, for current projects, jar index is difficult to apply:

  1. The index file generated by jar -i is based on meta-inf/manifest Class path in MF comes from it, which is not involved in most of our current projects, so we need to do extra processing to generate the index file ourselves
  2. Only urlclassloader is supported. We need to customize the class loading logic ourselves


The full name of APP CDs is application class data sharing, which is mainly used to accelerate startup and save memory. In fact, it has been introduced as early as JDK1.5, but it is constantly optimized and upgraded in the subsequent version iteration process. Jdk13 is opened by default. The early CDs only supports bootclassloader, and appcds is introduced in jdk8, which supports appclassloader and custom classloader.

We all know that the process of class loading is accompanied by the process of parsing and verification. CDs is to store the data structure generated by this process in the archive file, which will be reused in the next run. This archive file is called shared archive tojsaAs a file suffix.

When used, the JSA file is mapped into memory, and the type pointer in the object header points to the memory address.

Let’s see how to use it.

First, we need to generate a list of classes that we want to share between applications, that islstDocuments. For Oracle JDK, the -xx:+unlockcommercialfeature command needs to be added to enable commercialization. This parameter is not required for openjdk. In jdk13, steps 1 and 2 are combined into one step, but it is still required for lower versions.

java -XX:DumpLoadedClassList=test.lst

After getting the list of LST classes, dump them into JSA files suitable for memory mapping for archiving.

java -Xshare:dump -XX:SharedClassListFile=test.lst -XX:SharedArchiveFile=test.jsa

Finally, add running parameters to specify archive files at startup.

-Xshare:on -XX:SharedArchiveFile=test.jsa

It should be noted that appcds will only take effect in fatjars that contain all class files. The nested jar structure of springboot cannot take effect. You need to use Maven shade plugin to create shade jars.

              <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
              <transformer implementation="org.springframework.boot.maven.PropertiesMergingResourceTransformer">
              <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
              <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
              <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">

Then it can be used according to the above steps, but if the project is too large and the number of files is greater than 65535, an error will be reported when starting:

Caused by: java.lang.IllegalStateException: Zip64 archives are not supported

The source code is as follows:

public int getNumberOfRecords() {
  long numberOfRecords = Bytes.littleEndianValue(this.block, this.offset + 10, 2);
  if (numberOfRecords == 0xFFFF) {
    throw new IllegalStateException("Zip64 archives are not supported");

This problem has been fixed in version 2.2 and above, so try to use a higher version when using it to avoid such problems.

Heap Archive

Heaprarchive is introduced in jdk9 and officially used in jdk12. We can think that heap archive is an extension of appcds.

Appcds persistes the data generated by validation and parsing during class loading, while heap archive is the data of heap memory related to class initialization (initialization by executing static code block cinit).

To put it simply, heaprarchive can be regarded as persisting some static fields through memory mapping during class initialization, avoiding calling the class initializer and getting the initialized classes in advance to improve the startup speed.

AOT compilation

As we said, JIT compiles bytecode into local machine code at run time and executes it directly when necessary, which reduces the time of interpretation and improves the running speed of the program.

The three ways we mentioned above to improve the speed of application startup can be classified as the process of class loading. When we really create an object instance and execute a method, it may not be JIT compiled, and the execution speed in the interpretation mode is very slow, so the AOT compilation method is produced.

AOT (ahead of time) refers to the compilation behavior that occurs before the program runs. Its role is equivalent topreheat, which is compiled into machine code in advance to reduce the interpretation time.

For example, now spring cloud native is like this. It is statically compiled into executable files directly at run time and does not rely on JVM, so it is very fast.

However, the AOT technology in Java is not mature enough. As an experimental technology, the version after jdk8 is closed by default and needs to be opened manually.

java -XX:+UnlockExperimentalVMOptions -XX:AOTLibrary=

And due to the long-term lack of maintenance and tuning technology, it has been removed in the version of JDK 16, so I won’t repeat it here.

Offline time optimization

Elegant offline

Spring boot has added new features in version 2.3Graceful shutdown, which supports jetty, reactor netty, Tomcat and undertow. Usage:

  shutdown: graceful

#Maximum waiting time
    timeout-per-shutdown-phase: 30s

If it is lower than version 2.3, the official also provides the implementation scheme of the lower version. The implementation in the new version is basically the same logic. First pause external requests and close the thread pool to process the remaining tasks.

public class Gh4657Application {

    public static void main(String[] args) {
        SpringApplication.run(Gh4657Application.class, args);

    public String pause() throws InterruptedException {
        return "Pause complete";

    public GracefulShutdown gracefulShutdown() {
        return new GracefulShutdown();

    public EmbeddedServletContainerCustomizer tomcatCustomizer() {
        return new EmbeddedServletContainerCustomizer() {

            public void customize(ConfigurableEmbeddedServletContainer container) {
                if (container instanceof TomcatEmbeddedServletContainerFactory) {
                    ((TomcatEmbeddedServletContainerFactory) container)


    private static class GracefulShutdown implements TomcatConnectorCustomizer,
            ApplicationListener<ContextClosedEvent> {

        private static final Logger log = LoggerFactory.getLogger(GracefulShutdown.class);

        private volatile Connector connector;

        public void customize(Connector connector) {
            this.connector = connector;

        public void onApplicationEvent(ContextClosedEvent event) {
            Executor executor = this.connector.getProtocolHandler().getExecutor();
            if (executor instanceof ThreadPoolExecutor) {
                try {
                    ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor;
                    if (!threadPoolExecutor.awaitTermination(30, TimeUnit.SECONDS)) {
                        log.warn("Tomcat thread pool did not shut down gracefully within "
                                + "30 seconds. Proceeding with forceful shutdown");
                catch (InterruptedException ex) {



Eureka service offline time

In addition, I mentioned the problem that the client perceives the offline time of the server in the previous article.

Eureka uses a three-level cache to store the instance information of the service.

When the service is registered, it will keep a heartbeat with the server. The heartbeat time is 30 seconds. After the service is registered, the instance information of the client is saved to the registry service registry, and the information in the registry will be synchronized to readwritecachemap immediately.

If the client perceives this service and wants to read from the readonlycachemap, the read-only cache takes 30 seconds to synchronize from the readwritecachemap.

Both the client and ribbon load balancer maintain a local cache, which is synchronized in 30 seconds.

As mentioned above, let’s calculate how long it takes for the client to perceive an extreme situation of service offline.

  1. The client will send heartbeat to the server every 30 seconds
  2. Registry saves the instance information of all service registrations. It will keep a real-time synchronization with readwritecachemap, and readwritecachemap and readonlycachemap will synchronize every 30 seconds.
  3. The client synchronizes the registered instance information of readonlycahemap every 30 seconds
  4. Considering that if ribbon is used for load balancing, it also has a layer of cache that is synchronized every 30 seconds

If a service goes offline normally, in extreme cases, the time should be 30 + 30 + 30 + 30, which is almost 120 seconds.

If the service goes offline abnormally, it also needs to execute a cleaning thread every 60 seconds to eliminate services that have no heartbeat for more than 90 seconds. Then the extreme situation here may take three times of 60 seconds to detect, that is, 180 seconds.

The longest cumulative perception time is: 180 + 120 = 300 seconds, 5 minutes.

The solution, of course, is to change these times.

The time to modify the ribbon synchronization cache is 3 seconds:ribbon.ServerListRefreshInterval = 3000

Modify the client synchronization cache time to 3 seconds :eureka.client.registry-fetch-interval-seconds = 3

The heartbeat interval is modified to 3 seconds:eureka.instance.lease-renewal-interval-in-seconds = 3

The timeout elimination time is changed to 9 seconds:eureka.instance.lease-expiration-duration-in-seconds = 9

The cleaning thread timing time is changed to 5 seconds to execute once:eureka.server.eviction-interval-timer-in-ms = 5000

The time to synchronize to the read-only cache is modified to once every 3 seconds:eureka.server.response-cache-update-interval-ms = 3000

According to this time parameter setting, let’s recalculate the maximum time that we may perceive the service offline:

Normal offline is 3+3+3+3=12 seconds, and abnormal offline plus 15 seconds is 27 seconds.


OK, that’s all for the start-up and offline time optimization of spring boot service, but I think the service splitting is good enough and the code is better written. These problems may not be a problem.