[getting the interviewer] series: redis Foundation



Under the background of the vigorous development of Internet e-commerce, the traditional relational databases (mysql, Oracle) can no longer meet the complex scenarios such as high concurrency and limited time spike. At this time, NoSQL (non relational database) came into being, and redis is a bright star in the army of NoSQL, which is also an absolutely unavoidable topic in the interview of large factories.

Interview start

>Hello, classmate. Let’s make a brief automatic introduction first?

Hello, interviewer. My name is young Xia Lufei,… I am proficient in redis, rocketmq and other middleware. I look forward to joining your department.

>OK, I think you mentioned redis in your self introduction. Why did you use redis in your project, or what scenario did you use redis based on?

what? I can’t help scolding in my heart. What’s the problem? Everyone is using it. I just use it for use. But we certainly can’t tell our true thoughts. Of course, being a man should have some connotation.

So he seriously replied, “Hello, glittering” interviewer, because the traditional MySQL database can no longer be applicable to all scenarios, such as limited time spike, large traffic peak shaving, etc. the instantaneous traffic in these scenarios can reach tens of thousands or even hundreds of thousands. If all these services reach mysql, the database will not be able to carry it, and it is easy to collapse and cause service downtime, so cache middleware is introduced. Redis and memcached are two leaders in the field of caching middleware, but redis supports more data types, so we chose redis after comprehensive consideration.

>Well, that’s a good answer. You just mentioned the data types. Can you introduce the data types and corresponding usage scenarios of redis?

String, hash, list, set and Zset (sorted set)

data type characteristic Usage scenario
string key-value 1. Cache function: redis is used as the cache layer and MySQL is used as the storage layer to speed up reading and writing and reduce the back-end pressure; 2. Counter function, such as video hits, article views, etc
hash key-field-value Assuming that the object to be stored has many attributes, in order to represent the object, there must be many keys to describe the object, but next, if I modify a value, I have to take out the object and modify it again. It is more troublesome, so there is hash
list Key value: add elements at the beginning or end of a list Post, reply, like, check the number of likes of a post, etc. on friends circle, microblog and other dating platforms
set The data is out of order and cannot be repeated Set can perform Union and intersection operations, so it is widely used in the scenarios recommended by common friends and second degree friends
zset The data is in order. Each member is given a score value, which supports sorting by score Ranking list of microblog and other websites

Explain these five basic data types and analyze their application scenarios in detail. Don’t be complacent or even be convinced by yourself, because it’s only qualified.
If you want to stand out from thousands of competitors, there must be something. You need to add these three data structures: < font color = “#e96900” > hyperloglog, geo, pub / sub < / font >

Here I’ll illustrate with a real sceneHyperLogLog。 In the field of e-commerce, it is often necessary to count the PV / UV of a page (respectively the number of views and the number of views). The weight removal must be taken into account when counting UVs, that is, no matter how many times a user clicks in a day, it is only counted as one time. At this timeHyperLogLogYou can show what you can do.HyperLogLogIt is based on cardinality statistics and realized through hash collision. The user ID will get the same value every time through the given hash algorithm, at the same index position of the hash bucket. Of course notHyperLogLogYou can also use set or string (if necessary)setnx)The same type is feasible. It’s not urgent. The interviewer will ask later.
It should be noted that for the statistics of daily UV, the relevant keys need to be set to expire on the same day, otherwise you will find that the UV data every day is abnormal.

>Well, the young man has a good grasp of redis’s data structure (he has given you a thumb in his heart). You mentioned it just nowsetnx, can you briefly talk about this principle and its use?

< font color = “#e96900” > never put yourself in a completely passive position during the interview. I wonder if you are smart enough to find out. The questions about redis and setnx asked by the interviewer are all mentioned in my answer first, that is to say, the interviewer can ask questions freely, and I can also actively mention some knowledge points in the answer to guide the interviewer to ask relevant questions and turn passivity into initiative. In this way, we can control the interview at our pace as much as possible</ font>

Answer the interviewer’s questions:setnxOften used as distributed locks,setnxThe core idea is to scramble for the lock, execute the relevant logic after grabbing it, and pay attention not to forget to use it after executionexpireSet an expiration time for the lock.

>Then the interviewer started the next round of attack and asked what to do if the process crashed unexpectedly or the service was restarted and maintained during expire after setnx?

At this time, you should respond immediately: in this case, the lock of setnx will never be released, which is a dangerous operation.
Then think a little and give a solution: redis’s set contains rich instruction parameters. The two commands can be combined into one command and becomeAtomic operationTo execute.

>The interviewer already knew that the guy in front of him was not simple, so he decided to increase the difficulty: if there are one billion keys in redis, of which 10W starts with a fixed known prefix, how to find them?

usekeysThe command can scan out the key list of the specified mode.
You can be a little more beautiful here.
[getting the interviewer] series: redis Foundation
You then added that since redis is a single thread model, if redis is providing services online,keysThe instruction will cause the thread to block for a period of time, and the online service will stop. It can be used at this timescanInstructions,scanThe instruction can extract the key list of the specified mode without blocking, but there will be a certain repetition probability. Just manually de duplicate the returned result.

>At this time, the interviewer can’t restrain his excitement. He has learned to rush to answer. This young man is very talented. Since the thread model of redis has just been mentioned, can you say?

Don’t panic at this time and answer steadily: redis is a single thread model, and the bottom layer is I / O multiplexing (I will introduce this knowledge in detail in the event driven module of nginx).

>At this time, the interviewer felt that the guy in front of him was different, so he continued to open the connection: redis is a memory database. How can the data be persistent?

There are two ways: RDB for full image persistence and AOF for incremental persistence. Because RDB takes a long time, it is difficult to achieve real-time persistence, and it will lead to a large amount of data loss during shutdown, so AOF is needed. When the redis instance is restarted, the RDB persistence file will be used to rebuild the memory, and then AOF will be used to replay the recent operation instructions to achieve a complete recovery of the state before restart.
It’s easy to understand here. RDB is understood as the total amount of data in a whole table, and AOF is understood as the log of each operation. When the server restarts, first insert all the data in the table into the memory, but it may not be complete. If you visit the log again, the data will not be complete. However, redis’s own mechanism is that when AOF persistence is enabled and there are AOF files, AOF files are loaded first; When the AOF is closed or the AOF file does not exist, load the RDB file; After loading the AOF / RDB file, redis starts successfully; When there is an error in the AOF / RDB file, redis fails to start and prints an error message.

>What happens when there is a sudden power failure during persistence?

Data will be lost, but this depends on the configuration of AOF log sync attribute. If performance is not required and the disk is synced during each write instruction, data will not be lost. However, under the requirements of high performance, it is unrealistic to sync every time. Generally, timed sync is used, such as 5s1 times. At this time, 5S of data will be lost at most.

>The interviewer decided to make love with you at this time, and then asked: what is the principle of RDB?

Here are two key steps: fork and cow. Fork means that redis performs RDB operations by creating a child process. Cow means copy on write. After the child process is created, the parent and child processes share data segments. The parent process continues to provide read-write services, and the dirty page data will be gradually separated from the child process.
Note: when answering this question, if you can tell the advantages and disadvantages of AOF and RDB, I think the interviewer will praise you. I will continue to add this in my blog later.

>Since this kind of question is not difficult for you, the smart interviewer decided to ask you in another direction: have you ever used redis as asynchronous queue and how?

The interviewer tried every means to ask us down, so we resolved the crisis again and again by virtue of our experience. The so-called Zhenchang strategy was in yunei, and Ollie gave
Seriously replied: redis generally uses the list data structure as the asynchronous queue,Use rpushProduction messages,lpopConsumer news. When consumerslpopSleep when you find no newssleepTry again later.
At this point, you add: in addition to usingsleep, there’s another instruction called listblpop, when there is no message, it will block until a new message arrives.

>At this time, the interviewer will think that your young man has unique views on redis and has hired you 100 times in his heart. However, we should be calm on the surface. After all, we are old technicians who have been on the battlefield for a long time. Q & A continues: like the pattern just now, datablpopAfter that, other consumers can’t consume. How to realize production once and consumption many times?

Using the pub / sub topic subscriber mode, you can implement a 1: n message queue.

>What about the disadvantages of the pub / sub theme subscriber model?

A problem with this mode is that when consumers go offline, the messages produced will be lost. We have to use professional message queues rocketmq or Kafka.

>If the interviewer doesn’t give up, how does redis implement the delay queue?

[getting the interviewer] series: redis Foundation
This series of questions is expected to be as patient as you. You also want to beat the interviewer: is it over. The epidemic is so serious that I dare not eat outside. I have to go home and cook by myself. People have to go to work tomorrow!
[getting the interviewer] series: redis Foundation
However, the young man restrained himself for the offer of the big factory, and then replied calmly: use the ordered set sortedset (Zset), take the message content as the key, and use the timestamp as the score to call the zadd command to produce the message. The consumer uses the zrangebyscore command to obtain the data polling before N seconds for consumption.

>Here’s the answer. The interviewer has given you a thumbs up and silently gave you a +, but he still refuses to stop. It doesn’t matter whether he selects talents or not. He mainly wants to ask you. So he sounded the horn of the next round of attack: do you know pipeline?

The time of multiple IO round trips can be reduced to one, but there is no causal correlation between the instructions executed by pipeline.

Note: in fact, when using redis benchmark for pressure measurement, it can be found that an important factor affecting the peak QPS of redis is the number of pipeline batch instructions.

>Have you understood the synchronization mechanism of redis?

Redis can use master-slave synchronization and slave-slave synchronization. During the first synchronization, the master node makes a bgsave and records the subsequent modifications to the memory buffer. After completion, the master node synchronizes the full amount of RDB files to the replication node, and the replication node loads the RDB image into memory after acceptance. After loading, notify the main node to synchronize the operation records modified during the period to the replication node for replay to complete the synchronization process. Subsequent incremental data can be synchronized through the AOF log, which is somewhat similar to the binlog of the database.

>Have you ever used redis cluster? How to ensure high availability of clusters?

Redis Sentinal (redis sentinel, which I will write a separate blog introduction later)Focus on high availability. When the Master goes down, it will automatically promote the slave to the master and continue to provide services.

Redis Cluster In view of scalability, cluster is used for fragment storage when a single redis memory is insufficient.

End of interview

The young man is OK. I’m very satisfied. Our department is short of talents like you. Why don’t we go through the entry formalities today?
At this time, you must be steady and restrain your excited heart: it’s so urgent, the epidemic is so serious, it’s hard to find a house, or next Monday.
The interviewer heard that, alas, the young man is estimated to have a lot of offers in hand. How can such talents be spared? No, where is the HR Manager? Add money!!!
When you answer these questions one by one, do you think you’re great?


During the technical interview, no matter redis or any questions, if you can cite the questions and achievements of the actual development process, it will give the interviewer a lot of impression, and the answer should be more logical. Don’t hammer one thing and beat another, and it’s easy to get dizzy.
In addition, the interview should not simply ask and answer. If you can spread some knowledge beyond the questions, the interviewer will feel that you are not just a person who can write code. Your logic is clear, and you have your own understanding and thinking about technology selection, middleware and projects. Naturally, you will be praised in your heart.

A little attention won’t get lost