Asynchronous and parallel computing in backend requests

Time:2021-10-13

Note: This article is the knowledge sorting in the learning process, and there is no personal practical experience. If you need further study, you can see the reference link.

In the back-end development process, we will encounter a situation: user requests involve time-consuming calculations, such as:

  • Connect to the mail server and send a complex HTML email
  • Start a crawler task after receiving the request

In case of similar problems, we will not complete the calculation during the processing of the user’s request, making the user experience a long wait, but throughAsynchronous computingDistributed computingAnd so on. Transfer the task to the background asynchronously, or split a time-consuming calculation to multiple servers at the same time.

Asynchronous computing

Thread pool asynchronous execution

Thread asynchrony means that a call request is sent to the callee, and the caller does not have to wait for the return of the result. Generally, tasks executed asynchronously take a long time

Distributed message queue

The most classic usage of message queue is to pass messages between consumers and generators through message pipelines. Consumers and generators are different processes. Producers write messages to the pipeline, and consumers read messages from the pipeline.

The main feature is asynchronous processing. The main purpose is to reduce request response time and decoupling. Therefore, the main usage scenario is to put operations that are time-consuming and do not need to return results in real time (synchronization) into the message queue as messages. At the same time, due to the use of message queue, as long as the message format remains unchanged, the sender and receiver of the message do not need to contact each other or be affected by each other, that is, decoupling and.

Benefits:

  • decoupling
  • increase speed
  • radio broadcast
  • Peak elimination

Conditions for using message queuing:

  1. Producers do not need feedback from consumers
  2. Allow transient inconsistencies

Gearman

The distributed program call framework can complete cross language mutual calls, which is suitable for running work tasks in the background.

MemcacheQ

A domestic developed lightweight distributed message queue service based on Memcache protocol, BDB persistent data storage and high performance.

characteristic:

  1. It is simple, efficient and based on Memcache protocol, which means that it can be used as long as the client supports Memcache protocol.
  2. The queue data is stored in BDB and stored permanently.
  3. Good concurrency performance.
  4. Support multiple queues

Kafka

A distributed streaming media platform, which mainly has three functions:

  • Publishing and subscribing to message flows is similar to message queuing, which is why Kafka is classified as a message queuing framework
  • The message flow is recorded in a fault-tolerant manner, and Kafka stores the message flow in a file manner
  • It can be processed when the message is released

Parallel computing

Map/Reduce

MapReduce is a programming model, a programming method, an abstract theory, and adopts the idea of divide and rule. The core steps of MapReduce framework are mainly divided into two parts: map and reduce. Each file fragment is processed by a separate machine. This is the map method. The results calculated by each machine are summarized and the final results are obtained. This is the reduce method.

Hadoop

Hadoop is an open source framework written in Java language, which stores massive data on distributed server clusters and runs distributed analysis applications. Its core components are HDFS and MapReduce (data storage and distributed computing).

HDFS is a distributed file system: the namenode server that stores file metadata information and the datanode server that actually stores data are introduced to store and read data distributed.

What does Hadoop do:

  • Big data storage: distributed storage
  • Log processing: good at log analysis
  • ETL: data extraction to Oracle, mysql, DB2, mongdb and mainstream databases
  • Machine learning: such as the Apache mahout project
  • Search engine: Hadoop + Lucene implementation
  • Data mining: currently more popular advertising recommendation, personalized advertising recommendation

reference resources

  1. Building high-performance web sites
  2. What is Hadoop for?
  3. What is the usage scenario of message queue?
  4. www.cnblogs.com/JimmyZheng/archive…

This work adoptsCC agreement, reprint must indicate the author and the link to this article