Design of customer label system (2) – Design of label ID



Last year, I wrote an article to introduce the design of the customer label system. Limited to the length of the article, I can only give a rough introduction. Today, I will add a section about a small design of tag ID.


Review the basic assumption of last year: there are 10million users, and the operation team has defined 100 tags. Although not every user will have 100 tags, according to the largest assumption, the system needs to save 1billion tags. This is a very large amount of data. To apply labels, a large number of rule operations will be involved. How can we achieve better performance in the process of rule operations?


One trick is to do some design on the ID of the tag. Each tag is composed of tag key + tag value, and each tag ID is uniformly 32 bits.

Design of customer label system (2) - Design of label ID

Design of label ID png
  1. The 16 bits representing the tag key can use the database auto increment ID or a similar auto increment method to ensure the unique representation of the tag.
  2. The data types of these labels (Boolean, varchar, integer interval, floating-point interval, date time interval, rule class) are included in the expression of label ID in order to improve the processing performance of computer programs and make it easier to query cache and allocate memory.
  3. The tag value must be unique within the key. In different tag keys, the 12 digits of tag value can be repeated.
  4. For boolean type labels, such as various black-and-white lists, the tag value (12bits) is fixed with 0 and 1 to represent false and true respectively.
  5. All tag IDS (tag key ID + tag value data type + tag value ID) are generated using the above ID generator.
  6. Generally, there are no more than 1000 valid tag keys in a company, which is generally related to the number of people in the company’s operation team. There is a limited number of tags that one can roughly know the meaning and use. Too many labels defined by the label system will not have operational significance. No operator can remember the definitions of thousands of labels and apply them to operational work.
  7. If a tag has more than 4K enumerable values, you should rethink the definition of the tag. It should not be a tag.
  8. The tag ID designed in this way can support 65536 tags, and each tag can enumerate up to 4K values. Even considering that the company has multiple operation teams and the deletion or invalidation of labels, the number of 65536 labels is enough to meet the needs of most companies.
  9. Labels are deeper and more concise than indicators, which are qualitative and not quantitative.

Performance significance

Although 10million users are tagged, there may be 1billion tags. However, the operation team has only defined 100 tags, and the tag definition table is only a small table, which can be fully cached in the application server. 1billion tags are just a simple association between customer ID and tag ID, and can be processed in many ways by database and table.

  1. Generally, when a user visits, it is a certain user, and there are dozens of tags under this user. By directly parsing dozens of tag IDS, you can directly associate them with the tag objects in the cache. There is no need to join in the database or access the tag table.
  2. When configuring marketing rules, tag ID can be written directly into the rule expression to become the digital expression of expression variables. In this way, there is no need to resolve the tag ID during rule operation, and the logic operation is performed directly.