This series is derived from
"Manon turned over"The reading activity initiated by the knowledge planet is sponsored by the boss
@My UDP does not lose packetsRecommended, there are some alternatives in this reading activity. We abandoned traditional books and began to introduce the top graduate course < 6.824 >, which was the inventor of worm virus many years ago
Robert MorrisThe boss teaches and belongs to
MIT, the main teaching methods are: Video + lab experiment (go language) + thesis. The whole course is in English, which is difficult.
Judgment basis of distributed system
- Multiple cooperating computers
- Storage for big web sites, MapReduce, peer-to-peer sharing
- Lots of critical infrastructure is distributed
MapReduce: large scale data set computing systems, such as computing from 1 to 100 billion, can be calculated by a single computer, or can be dispersed to multiple computers by using this technology, and then the results are combined, which greatly improves the efficiency
Why distributed systems
- To increase capacity via parallelism
- To replicate faults via replication
- To place computing physically close to external entities
- To achieve security via isolation
Fault tolerance:For fault tolerance, there are two main points: one is availability and the other is recoverability
For distributed systems, generally, not all servers will be paralyzed at the same time. Therefore, both service availability and data security are more guaranteed than single services.
Difficulties of distributed
- Additional attention needs to be paid to concurrent programming, and the ability requirements for developers are rising sharply
- The interactions within the system are very complex
- Unexpected error: local error
- Expected performance often does not match actual performance
local error: suppose that the probability of a machine failure every day is one thousandth. In a single application, it may work for a long time, but in a distributed system, the number of devices increases sharply, and devices may fail every day. This is the so-called local error, which is difficult to troubleshoot and almost inevitable
Here is a comparison between single application and distributed application. The picture is from geek time · listening to the wind in left ear
Solutions for distributed systems
We need to design a series of abstractions that can shield the complexity of distributed systems
Why set this goal?
Because the distributed system itself is complex enough, it must be simplified
What does simplification have to do with abstraction?
The most perfect abstraction I currently recognize is: file
“UNIX files are essentially a bag of bytes.” – “the art of UNIX Programming”
In UNIX, any read / write device that has I / O, whether it is a file, socket or driver, has a corresponding file descriptor after opening the device. UNIX simplifies the reading and writing of these devices in read / write. In other words, you only need to pass the open file descriptor to these two functions. The operating system kernel knows how to get the specific device information according to the file descriptor. The details of reading and writing various devices are hidden inside. All these are transparent to users. You only need to open it to get FD, The corresponding operation is enough.
- RPC remote call, thread and concurrency control
- Usually we want to provide a system with scalable performance.
The parallel capability can be enhanced by simply increasing the number of computers in the system, so as to partially expand the performance of the system:
- This is effective when there is no complex interaction
- You don’t have to hire expensive programmers to redesign the system.
Simply increasing the number of computers in the system does not always increase the system performance:
- When the number of computers becomes large, the load is uneven, the performance of each computer in the system is uneven, the code that cannot be executed in parallel, and the interaction of initialization will reduce the performance of the system.
- Access from shared resources can also cause performance bottlenecks, such as network communication or database
At the same time, performance cannot always be achieved by increasing the number of computers in the system:
- For example, fast response time from a single user request
- For example, all users want to update the same data.
- Often these situations require better programming rather than more computers.
- A large number of servers + large systems usually mean that errors always occur
- We need to hide these errors from the application
We usually want the system to have availability and recoverability
- Availability: the system can continue to run even if an error occurs
- Recoverability: after the error is fixed, the system can resume operation
- It is usually possible to increase fault tolerance with a standby server
It is often difficult to achieve a system that works correctly:
- It is difficult to maintain consistency between the server and its backup server, and the cost is too high
- The client may make an error halfway.
- The server may crash after processing and before replying
- Poor network may make normal servers unable to provide services
Consistency and performance are often contradictory:
- High consistency requires a lot of communication between various basic settings
- Many designs are forced to provide only weak consistency in order to improve performance
uniformity: consistency seems to be the most difficult problem to solve, because it essentially includes many elements such as performance, fault tolerance, data consistency and so on
As we said earlier, in order to consider the fault tolerance and disaster recovery mechanism, data backup is required. In the distributed system, if service a modifies the value of database a, whether the value of database B should be changed immediately or delayed, what should be done if there is a problem in synchronous modification and what should be done if there is a problem in asynchronous modification
Finally, it is difficult for the industry to solve the corresponding problems. Therefore, the mainstream way is:
That is, data inconsistency is allowed in a short time, and the performance and data security are guaranteed through final consistency
Continuous brain map
File sharing address:https://www.processon.com/vie…
Contents of the next chapter
In the next chapter, we will carry out lab 1 in < 6.824 >, that is, to implement a simple
MapReduceSystem, which will be built in go language
Go language is one of the most popular languages in recent years. Its personal value is greater than the hot python
Requirements of this chapter
- Understand the origin and challenges of distributed systems
- Understand the distributed system solutions covered in the < 6.824 > course
- Build a go language environment and write HelloWorld (the syntax and Mr implementation will be studied in the next chapter)
If you think it’s useful to you, don’t forget to like it ~ you can also scan the QR code to pay attention to me and move towards the peak of technical people together!