Saving the programmer of sacrificing heaven — event traceability model

Time:2021-11-25

1、 Beforehand

Do you believe it? There was a time when I hardly received any demand for qualified products.

In the first few words, the technology depends on guessing.

Always think of simple needs

Once, I received such a demand from the product:

Users of the system are graded. Users of different levels have different benefits.

Still as usual, no picture, no document, just such a sentence. I know. In a word, I need to analyze the five-day work. For the sustainable development of the project, I had to analyze it myself.

From a business point of view, the currentuserThere is no grade for the object. Let’s start withuserObject plus a level attribute. And because of different user levels, you can enjoy different benefits. For example, users who reach level 3 can enjoy a 9.5% discount on shopping, free logistics expenses, rapid response from customer service, etc.

Therefore, I made the following design:

First, I put the benefits of each level of users into a list. This is used for the front-end to display the benefits currently available to users.

Then, in each benefit, I set a minimum level to enjoy this benefit. Users can enjoy this benefit only when their level exceeds the minimum level. For example, I only need to pack a 9.5% discount payment right in the payment service, and then set a minimum level.

It seems so simple, so the implementation scheme is nothing special. Every time a user upgrades, I just need to update the user level.

At this time, the demand is relatively primary and the requirements are not high. After the upgrade conditions are met, the user needs to actively click upgrade. At the same time, fill in some relevant information and apply for some exclusive benefits.

Good, design, development, online one-stop!

Demand becomes a pit

After a while, our operators had the courage to explore, were diligent in development, and went to exchange a lot of resources. When I heard about it, I had a bad feeling in my heart.

Sure enough, within two days, our products happily informed me that because the brother team is willing to cooperate with our project, the user’s benefits will be greatly enriched, and those richer benefits will be provided by the brother team.

Therefore, please let me simply connect with these partners to further enhance the stickiness of our system.

As usual, there is still no document, and I can only analyze it myself.

Now, according to my rich experience of being tossed, I know that there are pits. When I interface with the partners, they all need me to pass in some specific user IDs, so that both sides can share users.

The requirements are getting complicated, but fortunately, I can just change the code. Fortunately, I’m relieved

Good, design, development, online one-stop!

Unfortunately, our business is like swarms of bees. You never know what kind of flowers they will bring you.

Before long, the product told me that several brother teams wanted to hold a super event with us. I think it’s dark

Without documents and product prototypes, it is still the comings and goings in wechat.

I know at this time, I have to think deeply. Demand can be wanton, and can I stop the wanton business demand? No, so I have to consider a set of flexible solutions to meet these ever-changing and flying needs.

2、 First sight

The beginning of hidden danger

Take a look at this damn event.

First of all, according to the design, if the partners want to get together with us, we should tell them the user upgrade information. In this way, partners can verify and provide benefits corresponding to the user level. Therefore, when our users upgrade, I need to synchronize this matter to our partners every time.

And because we work with multiple brother teams, such as logistics team and payment team. In this case, the interaction logic of different partners is distributed in different services.

At this point, I have two options:

1. In user services, when users upgrade, they immediately and actively call the relevant logic distributed on different services through the interface to synchronize the user upgrade to the partners. However, this scheme has a big problem – because we need to call the interfaces of other services, which leads to the coupling between services. There will be some minor changes in the future, which may require us to change the code.

2. In microservices, message queues are highly recommended. When users upgrade, I just need to send a message to the message queue, and then let the relevant services subscribe to the message. In this scheme, the relationship between services can be decoupled by using message queuing.

Because the purpose of microservice itself is decoupling and flexibility, and the second scheme is compatible with our architecture, I chose the second scheme.

In the second scheme, because messages can decouple services, when users upgrade, I only need to operate the user table in the user service database to upgrade, wrap the upgrade into messages and throw them into the message queue.

I can even putUpdate user tableandSend upgrade message to queueWrap into a transaction.

Good, design, development, online one-stop!

Is this the technical solution that can cope with subsequent changes? Facts have proved that it can’t, because this scheme will be completely defeated by changing needs.

The outbreak of the problem

Time and space change. Demand is pouring in like water, and our technical scheme is like a dam that is not robust enough to be enhanced in any case.

After several times of demand transformation, the user upgrade has become an automatic upgrade after meeting the conditions; The number of brother teams we cooperate with is also increasing; Our services are being dismantled more and more… In these gurgling changes, the problem is like a crocodile lurking at the bottom of the water. It is about to climb ashore to hunt a few programmers to worship the sky.

Signs of problems initially appeared in the data upgraded by users. At that time, we were troubled by the questions raised by the operators one after another.

Some operators found that some users upgraded too quickly, and the upgrade speed of users has far exceeded the speed estimated in the original design.

This rapid upgrade not only makes the operators unable to conceive and design the follow-up operation activities in time, but also makes our operation costs rise rapidly, which has brought certain losses to the company’s operation.

Of course, as in the past, business will never make mistakes, and technology will always make mistakes. No, the reason for the problem is clearly arranged for us:

It is likely that there is a bug in the program, because there are some technical faults, which leads to the user’s failure to upgrade level by level, and there is a jump upgrade

When tracking the problem, we suddenly found a defect of this technical solution: because we didn’t expect the importance of user upgrade at all, many of our user upgrade related logs were not opened, and no user upgrade history was stored.

This moment has become a confused account. I have no luck to say.

To make matters worse, users complain that they always get stuck at some time. We checked again and found that the database problem was caused by user upgrade.

The earliest design was to update the database table directly after user upgrade, but the main idea was:

  • When the number of users rises sharply.
  • It is not difficult for new users to upgrade at the initial stage, so they upgrade very frequently.

Ignoring these two factors makes our database a little unable to withstand this frequent update.

Moreover, when checking these problems, the problems complained by some users in the past have also been dug out. For example, some benefits are not given to users after upgrading. Sadly, these traces have not been completely left

Muddle headed accounts and muddle headed accounts have become a bad account.

Ah, am I going to be worshipped?

After stamping his feet, IQ reoccupied the highland

Now let’s look at the problems we have to face.

The first problem is that the user upgrade can not trace the source. Because every time a user upgrades, we need to notify the relevant services, and then we have to ensure that each relevant service is processed successfully. At this time, the user upgrade is really successful. Therefore, in order to return the innocence of the technicians and avoid becoming bad debts, it is necessary to record each upgrade of users, and also record the handling of upgrade events by each relevant service.

The next little brother to solve is the problem of database update. What should I do with this database update? Post cache synchronization? What if there is a problem with the update of the cache itself? Verification! How to verify? Do you check the history every time you upgrade?

At this time, my head began to enter a state of chaos. I don’t know what to do.

I’m a little worried. What should I do? I have to go and see if there is any plan on the Internet that can provide some ideas.

Finally, this contributed to my initial view of the event sourcing model.

When I saw the source of the incident, I stamped my foot, and I felt my IQ came back.

Tracing the source of the event and saving me who was about to be sacrificed to heaven

First, let’s take a look at the origin of the incident.

Take the user upgrade we have done now as an example, let’s talk about the event traceability mode:

When users upgrade, we only need to transfer the user upgrade to payment services, logistics services and other related services through the middleware event store. Then, payment services and logistics services will also create an event object and put it in the event store after handling the events notified to them by users.

The event store here is mainly used to do two things:

  • Delivery event
  • Store event history

So, how does the traceability of events solve these problems I face?

First, if we want to trace the source, we need to save the user upgrade and the processing of related services after user upgrade to form a complete business chain. With this chain, it can be called tracing to the source.

The event traceability mode just tells you that you have to save something!

Secondly, do we need to update the level in real time after storing the events when our users upgrade?

Let’s analyze: what is the real purpose of user upgrade? From a business perspective, it is actually to improve the activity of users by providing various benefits. So, does this need real-time? It doesn’t seem necessary, because it’s almost impossible for users to use the corresponding benefits immediately after upgrading.

Well, if you can not update the database in real time, you can avoid updating the database in real time.

If we start to store historical events, we can actually upgrade the user level to the correct level according to the events at the user level in the early morning.

So we can see that tracing the source of the incident has solved both my problems.

This is my first view of the event traceability model. In the future technical career, it will always accompany me.

3、 Know

Really understand the event traceability mode

In fact, there are only two core features of the event tracing:

1. Package the reason that triggers the change of business data as an event object – if we look at it abstractly, we can package any situation that needs attention in the business as an event when it changes.

2. These business data packaged as events will be persisted and stored in a special place according to the sequence of events – we need to talk about the problem of storing events in sequence. In the event traceability mode, it is very important to persist and store events in the sequence of events. If the events in a pattern are not persistently stored in strict accordance with the sequence of events, it is difficult to say that this pattern will be a qualified event traceability pattern.

Therefore, the event traceability mode does two things:

  • Define what business logic can be defined as events;
  • Record the defined events in sequence after they occur.

Tracing the source of events often accompanies me

After recognizing the core characteristics of event traceability, I repeatedly used this model in my later development career to help me solve the problems of specific scenarios of different businesses. For example, the status of the order is updated, and for example, the performance of the second kill activity.

In the process of continuously using event traceability, I have summarized some scenarios that need to use event traceability. When I encounter a similar scenario, I always try to solve the problem with event traceability mode at the first time.

These scenarios are:

  • Want to know the intention, reason or purpose when key data is changed;

  • There is indeed a problem with the performance of updating data, and there is no way to solve this problem through hardware upgrading or large-scale clustering;

  • It is very important to restore some sites, or to restore the online environment through some data duplication;

It has been proved that the use of event traceability in these scenarios does live up to my expectations, and also brings many additional benefits:

1. Because events can be stored in sequence, they can be persisted by adding. This method of adding to persist events can be put on the front desk, where there are high requirements for user experience or performance. This will not cause front desk Caton. At the same time, events can be introduced into background tasks like water flow and processed slowly.

2. Events themselves are scene records. Therefore, when using these records, you can implement or reproduce some business states according to events at any appropriate time and environment according to your own situation.

3. The event storage itself can be regarded as an audit log. As long as the recorded information is complete, the event traceability itself will naturally become reliable and safe audit data.

4. Event traceability itself can be integrated with various event driven systems, which is very suitable for expanding and docking various event driven applications and systems.

5. Event traceability will not add complexity to already very complex business objects. For example, when an order object designs an order table according to the order object, it may have to make a comment field to store some updated instructions; You may also have to make a latest update time and record when the latest update occurred; Even due to the complexity of its own business status, it may have to be specially disassembled into several different status fields

In short, with the gradual deepening of my understanding of event traceability, I feel that I have begun to have the temperament of a microservice expert.

4、 Discontent

Of course, there’s nothing new in the sun. The introduction of any new thing will always bring some shortcomings. At the same time, with the increase of the number of times of using the event traceability model, I have become more and more aware of the shortcomings of this model.

1. There are too many event data to be stored, which leads to the introduction of another query responsibility separation mode (cqrs) to solve most query problems.

2. When using event traceability, the sequential storage of events is very important. Therefore, when using multithreading, multi process and cluster, it is necessary to strictly ensure the correctness of the sequential storage of events. Generally speaking, you have to get a timestamp for the event object, and you may have to introduce a globally unique identifier generator to generate the event ID.

3. Because the event itself is a business object, you know, it will evolve. Therefore, we must also consider the coexistence of old versions and new versions. Generally, we must at least get a version field for the event structure to identify the version of the event object.

4. Events are saved, and most of the time they are stored in additional forms. This leads to no way to query events. You can only query by event identifier and event time. In this case, in fact, an event flow is queried. If you want to reproduce and analyze the business object state, you have to reprocess the event flow to the whole.

5. Event traceability is actually an artificial loosening of business consistency requirements. However, the consistency of business needs still needs to be dealt with separately. For example, we set up an e-commerce website. At the same time, we implemented the business of updating the quantity of inventory goods through the event traceability mode, and happened to design various reasons for the reduction of inventory into different events. Then, when the reduction of inventory due to non customer orders happens, the customers are placing orders. At this time, we need to deal with their conflicts separately, To ensure the consistency of the state.

6. The event itself may need to be transmitted for business reasons. During this period, no matter what method is used to spread the event, no one will guarantee that the event will not be repeated. At this time, we have to consider the idempotency of handling events. This is also the trouble caused by tracing the source of the incident.

5、 End

Although the event traceability model has solved many of my problems, at the same time, I have increased a lot of work because of the introduction of this model. It’s so golden.

Perhaps there is no traceability model in the world, and some are just helpless to prevent back pot.


Hello, I’m four apes.

He is the technical director of a listed company and manages more than 100 technical teams.

I changed from a non computer major graduate to a programmer. I worked hard and grew all the way.

I will write my own growth story into an article and a boring technical article into a story.

Welcome to my official account, and I can get the dry cargo learning materials such as algorithm and high concurrency.

I set up a reader exchange group, most of which are programmers. They talk about technology, work and gossip together. Welcome to wechat and join us.