Memcached for multi structured data management
1. Introduction to memcached
- Memcached is a high performance
Distributed memory objectsCache system is used for dynamic web application to reduce database load.
- Basic features: cache data and objects in memory to provide faster running speed for dynamic and database driven websites, so as to reduce the number of times to read the database and reduce the disk overhead.
- Distributed cache, which can be accessed by multiple users on different hosts at the same time, solves the limitation of single machine application.
- Use your own page block allocator
- Using the
HashMap hash table
- No redundancy (such as copying HashMap entries) is provided. When a server s stops running or crashes, all key value pairs stored on s will be lost.
2. Memcached application rules
- Frequently accessed tables: user, user_ details
- Lifetime: the lifetime of a variable in memcached
- Active user information: pre imported to memcached
- Memcached service deployment: starting on multiple machines
- Monitoring memcached service: write corresponding monitoring script
3. Memcached operation principle
Although it is a distributed cache server, but!!!
Server side: no distributed function
Each memcached does not communicate with each other to share information
How to distribute: depends on the implementation of the client
libeventAs the underlying network processing component
Libevent learning gate: https://blog.csdn.net/Lemon_ tea666/article/details/92637297
Libevent GitHub gate:
Libevent: an asynchronous event handler library that encapsulates event handling functions of epoll of Linux and kqueue of BSD operating system into a unified interface.
All registered I / O and signal events are saved by bidirectional linked list, and min is used_ Heap to manage timeout events.
The main loop function continuously detects the registered events. If an event occurs, it will be put into the ready list and call the callback function of the event to complete the business logic processing.
The libevent interface encapsulates three events in a unified way
- Specific events on the file descriptor
- Timing events
The callback function is executed when the event occurs, rather than the event loop in the event driven network server. The user only needs to call event_ The dispatch() function, and then dynamically add or delete events.
4. Memcached memory allocation
Early memcached memory allocation was done by malloc and free for all records.
- It is easy to produce memory fragments;
- It increases the burden on the operating system memory manager.
- Improvement measures: by default
Slab AllocatorMechanism allocates and manages memory
Slab AllocatorBasic principle:
Chunk——According to a predetermined size, the allocated memory is divided into blocks of various specific lengths.
slab class——Blocks of the same size are divided into groups (sets of chunks).
The allocated memory will not be released and the allocated memory will be reused.
Slab AllocatorIt solves the original memory fragmentation problem, but also creates a new problem: due to the allocation of a specific length of memory, the allocated memory may not be effectively utilized. (to put it bluntly, it is a waste of bytes. Caching 100 bytes of data into 128 byte chunks wastes the remaining 28 bytes. )
5. Memcached distributed storage processing
Memcached implements distributed by saving different keys to different servers. When the number of servers increases, the keys will be dispersed. Even if one memcached server fails, other cache nodes will not be affected, and the system can continue to run.
The standard distributed method of memcached (the storage of keys is distributed according to the remainder of the number of servers)
1) Get the integer hash value of the key;
2) Divide by the number of servers and select the server according to the remainder.
3) When the selected server fails to connect, rehash — adds the number of connections to the key, then calculates the hash value again and attempts to connect.
Advantages: the method is simple and the dispersion of data is generally good.
Disadvantages: cache reorganization is costly when servers are added or removed.
Improved distributed method——
1) Get the hash value of the server node, and configure it to the circle of 0-232;
2) In the same way, the hash value of the key for storing data is obtained and mapped to the circle;
3) Start searching clockwise from the location to which the data is mapped, and save the data to the first server found;
4) If more than 232 still cannot find the server, save it to the first server.
Using the general hash function, the mapping location of the server may be uneven.
Each physical node (server) is allocated 100-200 points on the ring to suppress the uneven distribution and minimize the cache redistribution when the server increases or decreases.
6. Memcached architecture example
If there are about 200 memcached servers, and the capacity of each server is 3gb, the system will have a huge memory database of nearly 600gb.
Mr. Pan Peng ppt