Cache principle and automatic management of micro service cache

Time:2021-7-25

Why cache?

Let’s start with an old question: how does our program work?

  1. Program stored indiskin
  2. The program is running onRAMWhich is what we callmain memory
  3. The computational logic of the program isCPUMedium execution

Let’s take the simplest example:a = a + 1

  1. load x:
  2. x0 = x0 + 1
  3. load x0 -> RAM

Cache principle and automatic management of micro service cache

Three storage media are mentioned above. We all know that the reading and writing speed of the three types is inversely proportional to the cost, so we need to introduce a new method to overcome the speed problemMiddle layer。 This middle tier requires high-speed access, but the cost is acceptable. soCacheIntroduced

Cache principle and automatic management of micro service cache

In computer systems, there are two default caches:

  • The last level cache in the CPU, i.eLLC。 Cache data in memory
  • Cache pages in memory, i.epage cache。 Cache data on disk

Cache read / write policy

introduceCacheAfter that, let’s continue to see what happens to the operation cache. Because there is a “big difference” in access speed, when operating data, delay or program failure will lead to inconsistency between cache and actual storage layer data.

We take the standardCache+DBLet’s take a look at the classic read-write strategies and application scenarios.

Cache Aside

Let’s consider the simplest business scenario, such as user table:Userid: user ID, phone: user phone token, avtoar: user avatar URL, in the cache, we usephoneStore the user’s Avatar as a key. What should users do when they modify the avatar URL?

  1. to updateDBData, and then updateCachedata
  2. to updateDBData, and then deleteCachedata

firstChange databaseandChange cacheIt is two independent operations, and we do not do any concurrency control on the operation. Then, when two threads update them concurrently, the data will be inconsistent due to the different write order.

So a better solution is2

  • When updating data, the cache is not updated, but deleted directly
  • Subsequent requests find that the cache is missing and go back to queryDB, and the resultsload cache

Cache principle and automatic management of micro service cache

This strategy is the most common strategy we use cache:Cache Aside。 The policy data is subject to the data in the database. The data in the cache is loaded on demand, which is divided into read policy and write policy.

But the visible problem arises: frequent read and write operations can lead toCacheRepeated replacement reduces the cache hit rate. Of course, if there is a monitoring alarm for the hit rate in the business, the following schemes can be considered:

  1. When updating the data, the cache is updated at the same time, but one is added before updating the cacheDistributed lock。 In this way, there is only one thread operation cache at the same time, which solves the concurrency problem. At the same time, the latest cache is read in subsequent read requests, which solves the problem of inconsistency.
  2. When updating data, the cache is updated at the same time, but a shorter time is given to the cacheTTL

Of course, in addition to this strategy, there are several other classic caching strategies in the computer system, and they also have their own applicable use scenarios.

Write Through

First query whether the write data key hits the cache. If the cache is updated in – > and the cache component synchronizes the data to the DB; If it does not exist, it is triggeredWrite Miss

And generalWrite MissThere are two ways:

  • Write Allocate: direct allocation on writeCache line
  • No-write allocate: write directly to DB without writing to the cache, return

stayWrite ThroughIn general, theNo-write allocate。 In fact, no matter what, the final data will be persisted to the DB, eliminating one-step cache write and improving write performance. The cache isRead ThroughWrite cache.

Cache principle and automatic management of micro service cache

The core principles of this strategy are:The user only deals with the cache. The cache component communicates with the DB and writes or reads data。 This strategy can be considered in some local process cache components.

Write Back

I believe you can also see the defects of the above scheme: cache and database synchronization when writing data, but we know that the speed difference between the two storage media is several orders of magnitude, which has a great impact on the write performance. Do we update the database asynchronously?

Write backThat is, only the data is updated when writing dataCache LineCorresponding data and mark the row asDirty。 When reading data or swapping out the cache replacement policy when the cache is full, theDirtyWrite to storage.

It should be noted that:Write MissIn this case, what is taken isWrite Allocate, that is, write to the storage and write to the cache at the same time, so we only need to update the cache for subsequent write requests.

Cache principle and automatic management of micro service cache

async purgeSuch concepts actually exist in computer systems.MysqlThe essence of brushing dirty pages in is to prevent random writing as much as possible and unify the timing of writing to the disk.

Redis

RedisIt is an independent system software, and the business program we write is two software. When we deployRedisInstance, it will only passively wait for the client to send a request, and then process it. So if the application wants to useRedisCache, we need to add the corresponding cache operation code in the program. So we also putRedisbe calledBypass cacheIn other words, the operations of reading the cache, reading the database and updating the cache need to be completed in the application.

And as a cacheRedis, we also need to face common problems:

  • Cache capacity is limited after all
  • Upstream concurrent request impact
  • Data consistency between cache and back-end storage

Replacement strategy

Generally speaking, the cache will directly delete or write back the selected obsolete data to the database according to whether it is clean data or dirty data. However, in redis, obsolete data will be deleted whether it is clean or not. Therefore, we should pay special attention when using redis cache: when the data is modified to dirty data, we need to modify the data in the database.

Therefore, no matter what the replacement strategy is, dirty data may be lost in the swap in and swap out. When generating dirty data, we should delete the cache instead of updating the cache. All data should be subject to the database. It is also well understood that the cache write should be completed by the read request; Write requests ensure data consistency as much as possible.

As for the replacement strategies, there are many articles on the Internet to summarize the advantages and disadvantages, which will not be repeated here.

ShardCalls

In the concurrent scenario, multiple threads (coroutines) may request the same resource at the same time. If each request has to go through the resource request process, it will not only be inefficient, but also cause concurrent pressure on the resource service.

go-zeroMediumShardCallsIt can make multiple requests only need to initiate a call to get the result at the same time, and other requests “enjoy the fruits”. This design effectively reduces the concurrency pressure of resource services and can effectively prevent cache breakdown.

To prevent the interface requests from causing instantaneous high load on downstream services, you can package them in your function:

fn = func() (interface{}, error) {
  //Business query
}
data, err = g.Do(apiKey, fn)
//Data is obtained, and subsequent methods or logic can use this data

In fact, the principle is also very simple:

func (g *sharedGroup) Do(key string, fn func() (interface{}, error)) (interface{}, error) {
  //Done: false to execute the following business logic; If it is true, the previously obtained data will be returned directly
  c, done := g.createCall(key)
  if done {
    return c.val, c.err
  }

  //Execute the business logic passed in by the caller
  g.makeCall(c, key, fn)
  return c.val, c.err
}

func (g *sharedGroup) createCall(key string) (c *call, done bool) {
  //Let only one request come in for operation
  g.lock.Lock()
  //If the key carrying a series of requests already exists in the call map,
  //Then unlock and wait for the previous request to obtain data, and return
  if c, ok := g.calls[key]; ok {
    g.lock.Unlock()
    c.wg.Wait()
    return c, true
  }

  //Explain that this request is the first request
  c = new(call)
  c.wg.Add(1)
  //Mark the request, because it holds the lock, you don't have to worry about concurrency
  g.calls[key] = c
  g.lock.Unlock()

  return c, false
}

suchmap+lockStore and restrict requested operations, andgroupcacheMediumsingleflightSimilarly, they are sharp tools to prevent cache breakdown

Source address:sharedcalls.go

Cache and store update order

This is a common tangled problem in development:Delete the cache first or update the storage first?

Scenario 1: delete the cache first, and then update the storage;

  • ANetwork latency when deleting cache and updating storage
  • BRead request, find cache missing, read storage – > read old data at this time

This raises two questions:

  • BRead old value
  • BAt the same time, the read request will write the old value into the cache, resulting in subsequent read requests reading the old value

Since the cache may be an old value, delete it anyway. There is an elegant solution:After the write request updates the stored value,sleep()After a short period of time, delete the cache again

sleepTo ensure that the read request ends, the write request can delete the dirty cache data caused by the read request. Of course, the time-consuming of redis master-slave synchronization should also be considered. However, it still depends on the actual business.

This scheme will delete the cache value again after the first deletion, which is called:Delayed double deletion

Case 2: update the database value first, and then delete the cache value:

  • ADelete stored values, but delete cache network latency
  • BWhen reading a request, the cache hits and returns the old value directly

This situation has little impact on the business, and most cache components adopt this update order to meet the final consistency requirements.

Case 3: a new user is registered and directly written to the database. At the same time, there must be no user in the cache. If the program reads the slave library at this time, the user data cannot be read due to the master-slave delay.

This situation needs to be addressedInsertThis operation: insert new data into the database and write to the cache at the same time. This enables subsequent read requests to directly read the cache. At the same time, because it is new data just inserted, it is unlikely to be modified in a period of time.

The above schemes have more or less potential problems in complex situations and need to be modified in line with the business

How to design a good cache operation layer?

So much has been said above. Back to our development perspective, if we need to consider so many problems, it is obviously too troublesome. So how to encapsulate these caching strategies and replacement strategies to simplify the development process?

Clear points:

  • Separate the business logic from the cache operation and leave it to the development point of writing logic
  • Cache operation needs to consider traffic impact, cache strategy and other issues…

Let’s talk from the perspectives of reading and writinggo-zeroHow is it encapsulated.

QueryRow

// res: query result
// cacheKey: redis key
err := m.QueryRow(&res, cacheKey, func(conn sqlx.SqlConn, v interface{}) error {
  querySQL := `select * from your_table where campus_id = ? and student_id = ?`
  return conn.QueryRow(v, querySQL, campusId, studentId)
})

We will develop query business logicfunc(conn sqlx.SqlConn, v interface{})Encapsulation. Users do not need to consider cache writes, but only need to pass in the data to be writtencacheKey。 At the same time, the query resultsresreturn.

How are cache operations encapsulated internally? Let’s look inside the function:

func (c cacheNode) QueryRow(v interface{}, key string, query func(conn sqlx.SqlConn, v interface{}) error) error {
 cacheVal := func(v interface{}) error {
  return c.SetCache(key, v)
 }
 // 1. cache hit -> return
  // 2. cache miss -> err
 if err := c.doGetCache(key, v); err != nil {
    // 2.1 err defalut val {*}
  if err == errPlaceholder {
   return c.errNotFound
  } else if err != c.errNotFound {
   return err
  }
  // 2.2 cache miss -> query db
    // 2.2.1 query db return err {NotFound} -> return err defalut val「see 2.1」
  if err = query(c.db, v); err == c.errNotFound {
   if err = c.setCacheWithNotFound(key); err != nil {
    logx.Error(err)
   }

   return c.errNotFound
  } else if err != nil {
   c.stat.IncrementDbFails()
   return err
  }
  // 2.3 query db success -> set val to cache
  if err = cacheVal(v); err != nil {
   logx.Error(err)
   return err
  }
 }
 // 1.1 cache hit -> IncrementHit
 c.stat.IncrementHit()

 return nil
}

Cache principle and automatic management of micro service cache

From the process, it exactly corresponds to the following in the cache policy:Read Through

Source address:cachedsql.go

Exec

The write request uses the cache policyCache Aside->Write the database first, and then delete the cache.

_, err := m.Exec(func(conn sqlx.SqlConn) (result sql.Result, err error) {
  execSQL := fmt.Sprintf("update your_table set %s where 1=1", m.table, AuthRows)
  return conn.Exec(execSQL, data.RangeId, data.AuthContentId)
}, keys...)

func (cc CachedConn) Exec(exec ExecFn, keys ...string) (sql.Result, error) {
 res, err := exec(cc.db)
 if err != nil {
  return nil, err
 }

 if err := cc.DelCache(keys...); err != nil {
  return nil, err
 }

 return res, nil
}

andQueryRowSimilarly, the caller is only responsible for the business logic, and cache writes and deletions are transparent to the call.

Source address:cachedsql.go

Online cache

The first sentence at the beginning: breaking away from business and turning technology into hooligans. All the above are based on the analysis of cache mode, but does cache play its due role in accelerating the actual business? The most intuitive is the cache hit rate, and how to observe the cache hit rate of the service? This involves monitoring.

The following figure shows the cache record of a service in our online environment:

Cache principle and automatic management of micro service cache

Remember up thereQueryRowMedium: when the query cache hits, it will callc.stat.IncrementHit()。 Among themstatAs a monitoring indicator, it is constantly calculating the hit rate and failure rate.

Cache principle and automatic management of micro service cache

Source address:cachestat.go

In other business scenarios, such as home page information browsing, a large number of requests are inevitable. Therefore, caching the information on the home page is particularly important in the user experience. However, unlike some single keys mentioned earlier, a large number of messages may be involved here. At this time, other cache types need to be added:

  1. Split cache: can be splitMessage ID->ByMessage IDQuery messages and cache insertsMessage listYes.
  2. Message Expiration: set the message expiration time so that it does not take too long to cache.

Here are the best practices for caching:

  • Cache “especially important” that does not expire is not allowed
  • Distributed cache, easy to scale
  • Automatically generated with statistics

summary

From the introduction of cache, common cache read-write strategies, how to ensure the final consistency of data, how to encapsulate a useful cache operation layer, this paper also shows the situation and monitoring of online cache. All the cache details mentioned above can be referred togo-zeroSource code implementation, seego-zeroSource codecore/stores

Project address

github.com/tal-tech/go-zero

Welcome to go zero andstarEncourage us!

This work adoptsCC agreement, reprint must indicate the author and the link to this article