What is data desensitization
Let’s see what data desensitization is first? Data desensitization is also called data De privacy. When we give desensitization rules and strategies, for sensitive data, such as
Mobile phone number, bank card numberAnd other information, a technical means of conversion or modification to prevent sensitive data from being used directly in an unreliable environment.
For example, the government, the medical industry, financial institutions and mobile operators began to apply data desensitization earlier, because what they have is the core private data of users. If they leak, the consequences are immeasurable.
The application of data desensitization is quite common in life. For example, in the details of shopping orders on Taobao, the merchant account information will be used
*Shielding ensures that the merchant’s privacy is not disclosed, which is a way of data desensitization.
Data desensitization is divided into
Static data desensitization (SDM)）And
Dynamic data desensitization (DDM)
Static data desensitization
Static data desensitization（
SDM）: it is applicable to extract data from production environment and distribute it to testing, development, training, data analysis and other scenarios after desensitization.
Sometimes we may need to integrate data from the production environment
copyTo the test and development library for troubleshooting or data analysis, but for security reasons, sensitive data cannot be stored in the non production environment. At this time, sensitive data should be desensitized from the production environment and then used in the non production environment.
In this way, the desensitized data is isolated from the production environment, which not only meets the business needs, but also ensures the safety of production data.
As shown in the figure above, the user’s real
Bank card Noadopt
Symmetric encryptionAnd other schemes.
Dynamic data desensitization
Dynamic data desensitization（
DDM）: it is generally used in the production environment. Desensitization is performed in real time when accessing sensitive data, because sometimes different levels of desensitization are required for reading the same sensitive data under different circumstances. For example, the desensitization schemes executed by different roles and different permissions will be different.
be careful: while erasing the sensitive content in the data, we also need to maintain the original data characteristics, business rules and data relevance, so as to ensure that our development, testing and data analysis businesses will not be affected by desensitization, and make the data consistency and effectiveness before and after desensitization.In a word: take it off as you like, and don’t affect my use。
Data desensitization scheme
The data desensitization system can define and write desensitization rules according to different business scenarios. It can desensitize data without landing for a sensitive field in the database table.
There are many ways of data desensitization. Next, the data in the following figure shall prevail, and each scheme shall be demonstrated one by one.
In the invalidation scheme, when processing the data to be desensitized, the field data value is
hideAnd other ways to desensitize sensitive data so that it is no longer valuable. Special characters are generally used（
*Instead of true value, this method of hiding sensitive data is simple, but the disadvantage is that the user cannot know the format of the original data. If you want to obtain complete information, you should ask the user to authorize query.
For example, we replaced the real ID number with the real number, and turned it into a “220724 * * * * * 3523”, which is very simple.
2. Random value
Random value replacement, letters become random letters, numbers become random numbers, and words randomly replace words to change sensitive data. The advantage of this scheme is that it can retain the original data format to a certain extent, which is often difficult for users to detect.
idnumberThe field is desensitized by randomization, while the randomization of first name, last name and surname is slightly special, which needs to be supported by the corresponding last name dictionary data.
3. Data replacement
The data replacement method is similar to the previous invalidation method, except that it is not blocked by special characters, but replaces the true value with a set virtual value. For example, we set the mobile phone number to “13651300000”.
4. Symmetric encryption
Symmetric encryption is a special reversible desensitization method. Sensitive data is encrypted through encryption key and algorithm. The ciphertext format is consistent with the original data in logic rules. The original data can be recovered through key decryption. Attention should be paid to the security of the key.
5. Average value
The average value scheme is often used in statistical scenarios. For numerical data, we first calculate their mean value, and then make the desensitized values randomly distributed near the mean value, so as to keep the sum of data unchanged.
For price field
priceAfter the average value processing, the total amount of the field remains unchanged, but the desensitized field values are around the mean value 60.
6. Offset and rounding
In this way, the digital data is changed by random shift, and the offset rounding ensures the approximate authenticity of the range while maintaining the security of the data. It is closer to the real data than the previous schemes, which is of great significance in the big data analysis scenario.
For example, the date field below
In practical application, data desensitization rules are often used with a variety of schemes to achieve a higher security level.
Whether static desensitization or dynamic desensitization, it is ultimately to prevent the abuse of private data within the organization and prevent private data from flowing out of the organization without desensitization. Therefore, as a programmer, it is the minimum ethics not to disclose data.