Introduction to the author
Pan Juan, senior DBA of Jingdong digital science, is mainly responsible for the development of database middleware, database operation and maintenance automation platform, and production database operation and maintenance. Participated in the escort work of Jingdong’s 6.18 and 11.11 promotion activities for many times. He was responsible for the design and development of the database automation platform of Jingdong digital science, and now focuses on the development of Apache shardingsphere distributed database middleware. Willing to study and explore in database, automation, distributed, middleware and other related fields.
Detailed explanation of Apache shardingsphere data desensitization solution (Part 1) mainly introduces the specific implementation of internal desensitization function of shardingsphere. This sharing continues the previous part, combines the specific implementation with the actual scene, and provides a detailed introduction to the desensitization transformation of the existing online business and the desensitization use of the new online business.
2、 Solution details
After understanding the desensitization process of shardingsphere, the desensitization configuration and process can be combined with the actual scene. All the design and development are to solve the pain points of business scenarios. So, in the face of the business scenario requirements mentioned before, how to use shardingsphere to meet the business requirements?
New online business
Business scenario analysis: new online business is relatively simple because everything starts from scratch and there is no problem of historical data cleaning.
Solution description: after selecting an appropriate encryptor, such as AES, you only need to configure logical columns (write SQL for users) and ciphertext columns (store ciphertext data in data table),Logical columns and ciphertext columns can be the same or different。 The recommended configuration is as follows (displayed in yaml format)
aes_encryptor: type: aes props: aes.key.value: 123456abc
t_user: columns: pwd: cipherColumn: pwd encryptor: aes_encryptor
With this configuration, encrypt JDBC only needs to convert logiccolumn and ciphercolumn. The underlying data table does not store plaintext, but only ciphertext, which is also the requirement of security audit. If you want to store plaintext and column in the database, you can add them to the database. The overall treatment process is shown in the figure below
Online business transformation
Business scenario analysis: as the business has been running online, there must be a large number of plaintext historical data in the database. Now the problem is how to encrypt and clean the historical data, how to encrypt the incremental data, and how to make the business seamless and transparent migration between the new and old data systems.
Solution description: before providing solutions, let’s brainstorm: first of all, since the old business needs to be desensitized, it must store very important and sensitive information. The information is valuable and the business foundation is relatively important. If there’s a mistake, we’ll see you again. Therefore, it is impossible to stop business as soon as it comes up, prohibit new data writing, find an encryptor to encrypt and clean all historical data, and then deploy the previously reconstructed code online, so that it can encrypt and decrypt the stock and incremental data online. It must be cool to talk about such a simple and crude way according to historical experience.
As like as two peas, the other way is to rebuild a set of pre production environment that is exactly the same as the production environment, and then store the original data of the production environment through the relevant migration and cleaning tools to store the original data, then add the new data to the database of the preprocessing environment through the MySQL master slave replica system and the business development tools. The code can be deployed to the encryption and decryption environment. In this way, the production environment is a setQuery modification based on plaintextThe pre release environment is a set ofEncryption and decryption query modification based on ciphertextIt’s a good environment. After comparison for a period of time, the production flow can be switched to the pre delivery environment by night operation. This scheme is relatively safe and reliable, but the time, manpower, capital and cost are relatively high, mainly including: pre release environment construction, production code rectification, development of relevant auxiliary tools, etc. Unless there is no way to go, business developers usually go from getting started to giving up.
What business developers want most are:It is better not to modify the business code, and can migrate the system safely and smoothly。 Therefore, the desensitization function module of shardingsphere was applied. It can be divided into three steps
Before system migration
Suppose that the system needs to_ The PWD field of the user is desensitized, and the business side uses encrypt JDBC to replace the standardized JDBC interface, which basically does not need additional modification (we also provide access methods such as springboot, springnamespace, yaml, etc., to meet the needs of different business sides). In addition, a set of desensitization configuration rules is provided as follows:
aes_encryptor: type: aes props: aes.key.value: 123456abc
t_user: columns: pwd: plainColumn: pwd cipherColumn: pwd_cipher encryptor: aes_encryptor
According to the above desensitization rules, the first step is to set the desensitization rules in the database table t_ A new field in user is called PWD_ Cipher, that is, ciphercolumn, is used to store ciphertext data. At the same time, we set plancolumn to PWD to store plaintext data, and logiccolumn to PWD. Because the previous code SQL is written with PWD, that is, SQL is written for logical columns, so the business code does not need to be changed. Through encrypt JDBC, the plaintext will be written to PWD column for the new data, and the plaintext will be encrypted and stored in PWD at the same time_ Cipher column. At this time, due to query.with.cipher . column is set to false. For business applications, the plaintext column pwd is still used for query storage, but it is stored in the underlying database table PWD_ Ciphertext data of new data is additionally stored in cipher, and its processing flow is shown in the following figure:
When the new data is inserted, it is encrypted as ciphertext data through encrypt JDBC and stored in ciphercolumn. Now we need to deal with historical data.Because Apache shardingsphere does not provide relevant migration tools at present, the business party needs to encrypt the plaintext data in PWD and store it in PWD_ cipher。
System migration in progress
The newly added data has been stored in the ciphertext column and the plaintext in the plaintext column by encrypt JDBC; after the historical data is encrypted and cleaned by the business party, the ciphertext is also stored in the ciphertext column. That is to say, both plaintext and ciphertext are stored in the current database, only because of the query.with.cipher . column = false, so ciphertext has never been used. Now, in order to enable the system to query ciphertext data, we need to change thequery.with.cipher . column is set to true。 After restarting the system, we found that the business of the system is normal, but encrypt JDBC has begun to extract the data of the ciphertext column from the database, decrypt it and return it to the user; for the user’s needs of adding, deleting and modifying, the original data will still be stored in the plaintext column, and the encrypted data will be stored in the ciphertext column.
Although the business system takes out the data in the ciphertext column and decrypts it and returns it, it still stores a copy of the original data in the plaintext column when storing it. Why? The answer is: to be able to roll back the system.As long as ciphertext and plaintext always exist at the same time, we can switch business query to ciphercolumn or plaincolumn freely through switch configuration.That is to say, if the system is switched to the ciphertext column for query, it is found that the system reports an error and needs to be rolled back. So just put query.with.cipher The query will be restored, that is, the query will start again with encrypt = false. The processing flow is shown in the figure below
After system migration
Due to the requirements of the security audit department, it is generally impossible for the business system to keep the plaintext column and ciphertext column of the database permanently synchronously. We need to delete the plaintext column data after the system is stable. That is, we need to delete the plancolumn, PWD, after the system is migrated. That’s the problem. Now the business code is written in SQL for PWD. The PWD in the underlying data table is deleted and replaced by PWD_ Cipher decrypts the original data, doesn’t it mean that the business side needs to rectify all SQL so as not to use the PWD column that will be deleted? Remember the core of our encryption JDBC?
This is also the core meaning of encrypt JDBC, that is, according to the desensitization rules provided by users, the user SQL is separated from the underlying database table structure, so that the user’s SQL writing is no longer dependent on the real database table structure. The connection, mapping and transformation between users and underlying database are handled by shardingsphere.
Yes, because of the existence of logiccolumn, users write SQL for this virtual column, and encrypt JDBC can map this logical column to the ciphertext column in the underlying data table. So the desensitization configuration after migration is as follows:
aes_encryptor: type: aes props: aes.key.value: 123456abc
t_user: columns: PWD: PWD and PWD_ Transformation mapping of cipher cipherColumn: pwd_cipher encryptor: aes_encryptor
The treatment process is as follows
So far, the online business desensitization rectification solutions are all described. We provide Java, yaml, springboot and springnamespace for users to choose to access, and strive to meet the different access needs of services. At present, the solution has been continuously launched in Jingdong Digital Technology Co., Ltd., providing internal basic service support.
3、 Middleware desensitization service advantages
1. Automatic & transparent data desensitization process, users do not need to pay attention to the details of desensitization.
2. Provide a variety of built-in, third-party (aks) desensitization strategies, users only need simple configuration to use.
3. Provide desensitization strategy API interface, users can implement the interface, so as to use user-defined desensitization strategy for data desensitization.
4. Support switching different desensitization strategies.
5. For the online business, it can realize the synchronous storage of plaintext data and ciphertext data, and decide whether to use plaintext column or ciphertext column for query through configuration. Without changing the business query SQL, the online system can safely and transparently migrate the data before and after encryption.
4、 Applicable scenario description
1. User project is programmed in Java language.
It is the back-end database of SQL server, MySQL and SQL server.
3. Users need to desensitize one or more columns in the database table (data encryption & decryption).
4. Compatible with all common SQL;
1. Users need to process the original stock data and wash quantity in the database by themselves.
2. Use desensitization function + sub database and sub table function, some special SQL does not support, please refer to the SQL usage specification.
3. Desensitization fields cannot support comparison operations, such as greater than or less than, order by, between, like, etc
4. Desensitization field cannot support calculation operation, such as AVG, sum and calculation expression
6、 Follow up
This article introduces how to use encryption JDBC, one of shardingsphere products, to access. Spring boot and spring namespace can also be used as the access form. This form of access terminal is mainly oriented to Java isomorphism and deployed in the production environment together with business code. For heterogeneous languages, shardingsphere also provides an encrypt proxy client.Encrypt proxy is a server-side product that implements the binary protocol of MySQL and PostgreSQLUsers can independently deploy the encrypt proxy service, and access the desensitized virtual database server by using the third-party database management tools such as Navicat, Java connection pool and command line, just like using ordinary MySQL and PostgreSQL databases.
Desensitization function belongs to the function category of Apache shardingsphere distributed governance.In fact, the Apache shardingsphere ecosystem has other more powerful capabilities, such as data fragmentation, read-write separation, distributed transactions, monitoring and governance, etc. You can even select any number of function modules for superpositionFor example, desensitization + data fragmentation + monitoring, or data fragmentation + data fragmentation at the same time. In addition to the superposition selection at the functional level, shardingsphere also provides various access terminal forms, such as sharding JDBC or sharding proxy, to meet the needs of different scenarios
7、 Write at the end
Shardingsphere only supports database and table splitting at the beginning, and now it has formed an ecosystem with core functions such as data fragmentation, distributed governance and distributed transaction. This also indicates that it is not only a distributed database middleware, not only has the ability to divide database and table, but also forms a comprehensive solution ecosystem with data fragmentation, distributed governance and distributed transaction as the core. You are welcome to learn more about it on the official website and pay attention to us in GitHub ☺！
invite to one ‘s side men of wisdom and valor
• end •