In recent days, I always received some strange phone calls at home, “brother, you are XXX, we are XXX high-end men’s private club…”, holding grass, I was stunned at first, and then scolded back. Haughty face turned his head, with a smile slightly flattering: wife, listen to me, I really did nothing, you have to believe me!
After kneading my face and thinking about it carefully, it must be some immoral website that sold my personal information. Now people are in a state of streaking on the Internet. Personal information no longer belongs to individuals. Nowadays, it seems that this kind of thing is not surprising. However, there are many reasons for this kind of thingAn insider。
As developers, what we can do is to try our best to avoid the leakage of user data. Today, let’s talk about the internal means to prevent the leakage of private data in the Internet-Data desensitization。
What is data desensitization? Data desensitization is also called data De privacy. When we give desensitization rules and strategies, for sensitive data such as
Bank card numberAnd other information, a technical means of conversion or modification, to prevent sensitive data from being used directly in an unreliable environment.
For example, the government, the medical industry, financial institutions and mobile operators began to apply data desensitization earlier, because what they have is the most core private data of users, and the consequences of leakage are immeasurable.
The application of data desensitization is quite common in life. For example, in the details of Taobao shopping order, the merchant account information will be used
*This is a way of data desensitization.
Data desensitization is divided into static data desensitization（
SDM）And dynamic data desensitization（
Static data desensitization（
SDM）: it is suitable for extracting data from production environment and distributing it to test, development, training, data analysis and other scenarios after desensitization.
Sometimes we may need to integrate data from the production environment
copyTo test, development library, in order to check problems or data analysis, but for security reasons, sensitive data can not be stored in the non production environment. At this time, sensitive data should be desensitized from the production environment and then used in the non production environment.
In this way, the desensitized data is isolated from the production environment to meet the business needs and ensure the safety of production data.
As shown in the figure above, the user’s real
Bank card numberadopt
Out of order、
Symmetric encryptionAnd so on.
Dynamic data desensitization（
DDM）: it is generally used in the production environment to desensitize in real time when accessing sensitive data, because sometimes different levels of desensitization processing are required for the same sensitive data reading in different situations, for example, different roles and different permissions will execute different desensitization schemes.
be careful: while erasing the sensitive content in the data, we also need to maintain the original data characteristics, business rules and data association, so as to ensure that our development, testing and data analysis business will not be affected by desensitization, and make the data consistency and effectiveness before and after desensitization.In a word: take off as you like, don’t affect my use。
The data desensitization system can define and write desensitization rules according to different business scenarios, and can desensitize data for a sensitive field in the database table without landing.
There are many ways of data desensitization. Next, the data in the following figure will be used to demonstrate each scheme one by one.
When the desensitization data is processed in the invalidation scheme, the field data value is adjusted
hideAnd other ways to desensitize sensitive data, so that it no longer has the use value. Special characters are generally used（
*This method of hiding sensitive data is simple, but the disadvantage is that the user can not know the format of the original data. If you want to obtain complete information, you need to let the user authorize the query.
For example, we replace the real ID number with the ID card, and turn it into “220724”.**3523 “, very simple.
Random value replacement, letters into random letters, numbers into random numbers, text randomly replace text to change sensitive data, the advantage of this scheme is that it can retain the original data format to a certain extent, often this method is not easy for users to detect.
idnumberThe fields are desensitized by randomization, while the randomization of first name, surname and surname is a little special, which needs the support of corresponding surname dictionary data.
Data replacement is similar to the previous invalidation method. The difference is that instead of occlusion with special characters, a virtual value is used to replace the true value. For example, we set the mobile phone number to “13651300000”.
Symmetric encryption is a special reversible desensitization method. Sensitive data is encrypted by encryption key and algorithm. The ciphertext format is consistent with the original data in logic rules. The original data can be recovered by key decryption, and the key security should be paid attention to.
The average scheme is often used in statistical scenarios. For numerical data, we first calculate their mean value, and then make the desensitized values randomly distributed near the mean value, so as to keep the sum of the data unchanged.
For price field
priceAfter the average value processing, the total amount of the field remains unchanged, but the desensitized field values are all around the average value of 60.
In this way, the digital data is changed by random shift, and the offset rounding keeps the security of the data while ensuring the general authenticity of the range. Compared with the previous several schemes, it is closer to the real data and has great significance in the big data analysis scene.
For example, the date field below
In practical application, data desensitization rules are often used with a variety of schemes to achieve a higher level of security.
Whether it is static desensitization or dynamic desensitization, the ultimate goal is to prevent the abuse of private data within the organization and prevent the outflow of private data from the organization without desensitization. Therefore, as a programmer, it is the minimum integrity not to disclose data.
Organized hundreds of various technical e-books, students in need can pay attention to the public number of the same name“Something inside the programmer”reply「 666 」Take it from yourself. And if you want to add technology group, you can add my friends, talk about technology with big guys, push from time to time, and do some internal things for programmers.
This work adoptsCC agreementReprint must indicate the author and the link of this article