Optimization once and for all! Concurrent RPC call widget

Time:2022-5-13

preface

System performance optimization is the only way for every programmer, but it may also be the deepest routine. It requires not only in-depth understanding of various tools, but also customized optimization schemes combined with specific business scenarios. Of course, you can also hide a thread in your code Sleep, sleep a few milliseconds less when optimization is needed (manual dog head). The topic of performance optimization is so vast that there is no high-quality book on the market that can comprehensively summarize this topic. Moreover, even if it goes deep into various subdivided fields, the means of performance optimization are also very rich and dazzling.

This article will not cover all optimization routines, but only give its own general scheme for the scenario of concurrent calls encountered in the recent project development process. You can directly package or copy and paste it into the project. You are also welcome to give more comments and optimize the scene.

background

I wonder if you encounter such a scenario in the development process. We will first call service a, then call service B, and then call service C after assembling the data (if you do not encounter such a scenario in the development of micro service system, I would like to say that the granularity of your system is too coarse, or this is a lucky underlying system without downstream service dependency ~)

The time-consuming of this link isDuration (a) + duration (b) + duration (c) + other operations。 From experience, most of the time-consuming comes from the processing time of downstream services and network io. The time-consuming of CPU operation in the application is basically negligible. However, when we know that there is no dependency between the calls to services a and B, can we reduce the waiting time of synchronous calls by calling a and B simultaneously, so that the link time can be optimized intoMax (duration (a), duration (b)) + duration (c) + other operations

For another example, sometimes we may need to call downstream services in batches, such as querying user information in batches. For service protection, downstream query interfaces often restrict the number of queries that can be queried at a time. For example, only 100 user information can be queried at a time. Therefore, we need to split multiple requests and query multiple times, so the time-consuming becomesN * duration (a) + other operations。 Similarly, with the optimization method of concurrent requests, the time consumption can be reduced toMax (duration (a)) + other operations

The code implementation of the two scenarios is basically similar. This paper will provide the idea and complete implementation of the second scenario.

Small trial ox knife

The overall implementation class diagram of concurrent RPC calls is as follows:
Optimization once and for all! Concurrent RPC call widget

First, we need to create a thread pool for concurrent execution. Because there are usually other scenarios using thread pool in the program, and we hope that RPC calls can use a separate thread pool, we use the factory method to encapsulate it here.

@Configuration
public class ThreadPoolExecutorFactory {

    @Resource
    private Map<String, AsyncTaskExecutor> executorMap;

    /**
    *Default thread pool
    */
    @Bean(name = ThreadPoolName.DEFAULT_EXECUTOR)
    public AsyncTaskExecutor baseExecutorService() {
        //Subsequently, support the customization of various service parameters
        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
        //Setting thread pool parameter information
        taskExecutor.setCorePoolSize(10);
        taskExecutor.setMaxPoolSize(50);
        taskExecutor.setQueueCapacity(200);
        taskExecutor.setKeepAliveSeconds(60);
        taskExecutor.setThreadNamePrefix(ThreadPoolName.DEFAULT_EXECUTOR + "--");
        taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
        taskExecutor.setAwaitTerminationSeconds(60);
        taskExecutor.setDaemon(Boolean.TRUE);
        //Modify the reject policy to execute with the current thread
        taskExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        //Initialize thread pool
        taskExecutor.initialize();

        return taskExecutor;
    }

    /**
    *Concurrent calls to separate thread pools
    */
    @Bean(name = ThreadPoolName.RPC_EXECUTOR)
    public AsyncTaskExecutor rpcExecutorService() {
        //Subsequently, support the customization of various service parameters
        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
        //Setting thread pool parameter information
        taskExecutor.setCorePoolSize(20);
        taskExecutor.setMaxPoolSize(100);
        taskExecutor.setQueueCapacity(200);
        taskExecutor.setKeepAliveSeconds(60);
        taskExecutor.setThreadNamePrefix(ThreadPoolName.RPC_EXECUTOR + "--");
        taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
        taskExecutor.setAwaitTerminationSeconds(60);
        taskExecutor.setDaemon(Boolean.TRUE);
        //Modify the reject policy to execute with the current thread
        taskExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        //Initialize thread pool
        taskExecutor.initialize();

        return taskExecutor;
    }
    /**
     *Get thread pool according to thread pool name
     *If the corresponding thread pool cannot be found, an exception is thrown
     *@ param name thread pool name
     *@ return thread pool
     *@ throws runtimeException if the thread pool with this name cannot be found
     */
    public AsyncTaskExecutor fetchAsyncTaskExecutor(String name) {
        AsyncTaskExecutor executor = executorMap.get(name);
        if (executor == null) {
            throw new RuntimeException("no executor name " + name);
        }
        return executor;
    }
}

public class ThreadPoolName {

    /**
     *Default thread pool
     */
    public static final String DEFAULT_EXECUTOR = "defaultExecutor";

    /**
     *Thread pool used by concurrent calls
     */
    public static final String RPC_EXECUTOR = "rpcExecutor";
}

As shown in the code, we declare two spring thread pools, asynctask executor, which are the default thread pool and the thread pool of RPC calls, and load them into the map. The caller can use the fetchasynctaskexecutor method and pass in the name of the thread pool to specify the thread pool execution. Here’s another detail. The number of threads in the RPC thread pool is significantly larger than that in another thread pool, because RPC calls are not CPU intensive logic and are often accompanied by a lot of waiting. Therefore, increasing the number of threads can effectively improve the concurrency efficiency.

@Component
public class TracedExecutorService {

    @Resource
    private ThreadPoolExecutorFactory threadPoolExecutorFactory;


    /**
     *Specify the thread pool to submit asynchronous tasks and obtain the task context
     *@ param executorname thread pool name
     *@ param tracedcallable asynchronous task
     *@ param < T > return type
     *@ return thread context
     */
    public <T> Future<T> submit(String executorName, Callable<T> tracedCallable) {
        return threadPoolExecutorFactory.fetchAsyncTaskExecutor(executorName).submit(tracedCallable);
    }
}

The submit method encapsulates the logic of obtaining the thread pool and submitting asynchronous tasks. Here, the combination of callable + future is used to obtain the execution results of asynchronous threads.

When the thread pool is ready, we need to declare an interface for submitting concurrent calls to the service:

public interface BatchOperateService {

    /**
     *Concurrent batch operation
     *Logic executed by @ param function
     *@ param requests requests
     *@ param config configuration
     *@ return all responses
     */
    <T, R> List<R> batchOperate(Function<T, R> function, List<T> requests, BatchOperateConfig config);
}

@Data
public class BatchOperateConfig {

    /**
     *Timeout
     */
    private Long timeout;

    /**
     *Timeout单位
     */
    private TimeUnit timeoutUnit;

    /**
     *Do you need to execute all successfully
     */
    private Boolean needAllSuccess;

}

The function object is passed in the batchoperate method, which is the code logic that needs to be executed concurrently. Requests are all requests. Concurrent calls will recurse these requests and submit them to asynchronous threads. The config object can configure this concurrent call, such as the timeout time of concurrent query and whether the whole batch query continues to be executed if some calls are abnormal.

Next, take a look at the implementation class:

@Service
@Slf4j
public class BatchOperateServiceImpl implements BatchOperateService{

    @Resource
    private TracedExecutorService tracedExecutorService;

    @Override
    public <T, R> List<R> batchOperate(Function<T, R> function, List<T> requests, BatchOperateConfig config) {
        log.info("batchOperate start function:{} request:{} config:{}", function, JSON.toJSONString(requests), JSON.toJSONString(config));

        //Current time
        long startTime = System.currentTimeMillis();

        //Initialize
        int numberOfRequests = CollectionUtils.size(requests);

        //Execution results of all asynchronous threads
        List<Future<R>> futures = Lists.newArrayListWithExpectedSize(numberOfRequests);
        //Concurrent call management using countdownlatch
        CountDownLatch countDownLatch = new CountDownLatch(numberOfRequests);
        List<BatchOperateCallable<T, R>> callables = Lists.newArrayListWithExpectedSize(numberOfRequests);

        //Submit asynchronous thread execution separately
        for (T request : requests) {
            BatchOperateCallable<T, R> batchOperateCallable = new BatchOperateCallable<>(countDownLatch, function, request);
            callables.add(batchOperateCallable);

            //Commit asynchronous thread execution
            Future<R> future = tracedExecutorService.submit(ThreadPoolName.RPC_EXECUTOR, batchOperateCallable);
            futures.add(future);
        }

        try {
            //Wait for all execution to complete. If it times out and all calls are required to succeed, an exception will be thrown
            boolean allFinish = countDownLatch.await(config.getTimeout(), config.getTimeoutUnit());
            if (!allFinish && config.getNeedAllSuccess()) {
                throw new RuntimeException("batchOperate timeout and need all success");
            }
            //Traverse the execution results. If some execution fails and all calls are required to be successful, an exception will be thrown
            boolean allSuccess = callables.stream().map(BatchOperateCallable::isSuccess).allMatch(BooleanUtils::isTrue);
            if (!allSuccess && config.getNeedAllSuccess()) {
                throw new RuntimeException("some batchOperate have failed and need all success");
            }

            //Get all asynchronous call results and return
            List<R> result = Lists.newArrayList();
            for (Future<R> future : futures) {
                R r = future.get();
                if (Objects.nonNull(r)) {
                    result.add(r);
                }
            }
            return result;
        } catch (Exception e) {
            throw new RuntimeException(e.getMessage());
        } finally {
            double duration = (System.currentTimeMillis() - startTime) / 1000.0;
            log.info("batchOperate finish duration:{}s function:{} request:{} config:{}", duration, function, JSON.toJSONString(requests), JSON.toJSONString(config));

        }
    }
}

Usually, after submitting to the thread pool, we can directly traverse the future and wait for the results. But here we use countdownlatch for more unified timeout management. You can take a look at the implementation of batchoperatecallable:

public class BatchOperateCallable<T, R> implements Callable<R> {

    private final CountDownLatch countDownLatch;

    private final Function<T, R> function;

    private final T request;

    /**
     *Is the thread processing successful
     */
    private boolean success;

    public BatchOperateCallable(CountDownLatch countDownLatch, Function<T, R> function, T request) {
        this.countDownLatch = countDownLatch;
        this.function = function;
        this.request = request;
    }

    @Override
    public R call() {
        try {
            success = false;
            R result = function.apply(request);
            success = true;
            return result;
        } finally {
            countDownLatch.countDown();
        }
    }

    public boolean isSuccess() {
        return success;
    }
}

Whether the call is successful or abnormal, we will reduce the counter by one after the end. When the counter is reduced to 0, it means that the execution of all concurrent calls is completed. Otherwise, if the counter does not return to zero within the specified time, it means that the concurrent call has timed out, and an exception will be thrown.

Potential problems

One problem with concurrent calls is that we have amplified the flow of accessing downstream interfaces, even hundreds of times in extreme cases. If the downstream service does not take defensive measures such as current limiting, we are likely to hang up the downstream service (failures caused by this reason are common). Therefore, it is necessary to control the flow of the whole concurrent call. There are two methods of flow control. One is that if the micro service adopts the mesh mode, the QPS of RPC call can be configured in sidecar, Thus, the access to downstream services can be controlled globally (whether single machine flow restriction or cluster flow restriction is selected here depends on whether sidecar supports the mode and the traffic size of the service. Generally speaking, if the average traffic is small, single machine flow restriction is recommended, because the fluctuation of cluster flow restriction is often higher than that of single machine flow restriction, and too small traffic will lead to misjudgment). If mesh is not enabled, you need to implement the current limiter in your code. Here, we recommend guava’s ratelimiter class, but it only supports single machine current limiting. If you want to implement cluster current limiting, the complexity of the scheme will be further improved

Summary

Abstracting the scenarios encountered in the project development and giving general solutions as much as possible is not only an important way for each developer, but also a sharp tool to improve the reusability and stability of the code. Concurrent RPC call is a common solution. I hope the implementation of this paper can be helpful to you.