With the continuous development of Internet business, many institutions have accumulated a large amount of online data, making full use of these data to carry out relevant data analysis, feature mining, algorithm modeling is the key development direction of each institution. However, in most industries and enterprises, data exist in the form of isolated islands. Due to industry competition, privacy security, complex administrative procedures and other issues, even the data integration between different departments of the same company is facing many obstacles. In reality, it is almost impossible or necessary to integrate the data scattered in various places and institutions The cost is huge.
On the other hand, with the further development of big data, it has become a worldwide trend to pay attention to data privacy and security. This brings unprecedented challenges to the field of artificial intelligence. How to design a machine learning framework on the premise of meeting the security and regulatory requirements, so that the artificial intelligence system can use their own data more efficiently and accurately, is an important topic in the development of artificial intelligence.
In the past two years, the federal government has been learning technology ( Federated Learning ) It provides a new solution for cross team data cooperation and breaking the “data island”.
Federal learning is an emerging basic technology of artificial intelligence, which was first proposed by Google in 2016. It was originally used to solve the problem of Android mobile terminal users updating the model locally. Its design goal is to carry out the research among multiple participants or computing nodes on the premise of ensuring the information security, protecting the privacy of terminal data and personal data, and ensuring the legality and compliance of big data exchange Efficient machine learning. Machine learning algorithms available for federated learning are not limited to neural networks, but also include important algorithms such as random forest. Federated learning is expected to be the foundation of the next generation of AI cooperative algorithms and networks.
1、 Analysis of the architecture of JD Zhilian cloud Federation learning platform
Jingdong Zhilian cloud Federation learning platform aims to establish aFederated learning model based on distributed data sets.In the process of training, the model information is encrypted in the form of interaction between institutions, the interaction process will not expose any privacy data of institutions, and the trained model is shared among institutions.
Not long ago, with the help ofScheduling management ability, data processing ability, algorithm implementation, effect and performance, and securityJD Zhilian cloud federal learning platform successfully passed the “big data product capability assessment” of the Institute of information and communications, and was awarded the special assessment certificate of federal basic learning capability, which was recognized by the industry authority.
JD Zhilian cloud Federation learning platform can well solve the phenomenon of data isolated islands between government and enterprises, fully release AI application potential, and realize multi-party joint modeling under the premise of privacy data security. (as shown in Figure 1)
Figure 1 JD Zhilian cloud Federation learning platform
Why can jd.com cloud Federation data platform have the above characteristics?
The quality and quantity of data determine the upper limit of the effect of machine learning. In order to make the model (such as neural network) achieve better results, it may need to input more data to the model. A large amount of data needs to consume more storage and computing power, so we need to rely on distributed methods to provide sufficient computing power, storage and reasonable task scheduling for machine learning. The same is true of Federated learning. From the distributed federated learning architecture of JD Zhilian cloud in Figure 2, we can see that its essence is an encrypted distributed machine learning technology.
Figure 2 JD Zhilian cloud distributed federal learning architecture
JD Zhilian cloud Federation data platform can break through the isolated data island between partners, let multi-party data in an isolated environment, establish a virtual common model, fully release AI potential, and realize “common prosperity”.
Figure 3 federal learning application scenarios
As shown in Figure 3, JD Zhilian cloud Federation data platform can break through the data barrier between JD’s own data and its partners, model in the environment of data isolation, and establish a common model under JD’s data enabling, so as to realize deep mining and innovation of application scenarios.
2、 Main capabilities of JD Zhilian cloud Federation learning platform
1. Information encryption
JD Zhilian cloud federal learning platform is composed of federal learning client and JD Zhilian cloud gateway. The client is mainly responsible for data encryption and scientific computing, and JD Zhilian cloud gateway is responsible for transmitting the necessary encryption parameters among the clients of each participant.
The client is handed over to each participant in the form of image. The developers of each participant in federal learning need not care about the operating system version and development related software environment, and can directly load the image. Start the federated learning platform in the image, and then you can start the federated learning training.
The main work of JD Zhilian cloud gateway includes:Do system authentication to federal learning client and pass necessary encrypted parameters to each participant.In order to ensure the network security of each participant, JD Zhilian cloud federal learning platform adopts one-way network transmission strategy, that is, each participant can send network requests to JD Zhilian cloud gateway, while JD Zhilian cloud gateway cannot send network requests to each participant. With the support of this strategy, enterprises can only open the uplink permissions of the network, but close the downlink permissions. This effectively alleviates the concerns of some participants about network security.
At the same time, JD Zhilian cloud federated learning platform supports two sample alignment methods, namely, federated encryption alignment and MD5 alignment. Federated encryption alignment uses RSA algorithm combined with random noise to help two participants find the same user ID, so as to ensure that the non shared ID will not be leaked to the opposite party.
2. Federated algorithm
JD Zhilian cloud self developed gradient information protection. All parties involved in the training update their own model parameters locally. Therefore, before the encrypted gradient is sent, enough noise can be added. What the decryptor receives is the unrecoverable encrypted noise gradient, and we canThe real gradient is restored by subtracting the noise, and then the model parameters are updated. suchThe design fully protects the gradient information and ensures the accuracy of the model.
In addition, by analyzing the storage mode of sparse data, combined with the support of homomorphic encryption for addition and multiplication, JD Zhilian cloud Federation learning skillfully realizes the matrix multiplication operation between dense encrypted data and sparse data, and the operation efficiency is only related to the number of non-zero elements.
Jingdong Zhilian cloud Federation learning also providesLogistic regression 、 XGBoost 、 DNNAnd so on. Support Pearson, Spearman, woe (weight of evidence), IV (information value) and other feature analysis algorithms, provide outlier filling, normalization, feature bucket, count_ Encoding, one hot and other feature processing tools.
3. Based on the latest deep learning framework
JD Zhilian cloud Federation learning platform does not rely on spark, horn, k8s and other three-party frameworks. The whole network is built based on tensorflow2 launched by Google . 0 and its high order API TF . keras。 Based on the two tower network, users can define the DNN structure of each tower by themselves. Compared with tensorflow 1 . x. The debugging of the new tensorflow model is simpler, the API is relatively clear, and tensorflow 2 . X will also be the trend of the future.
In the process of fat model training, the sequential API in tensorflow is used, which can’t connect the calculation of the bottom network and the interactive network smoothly. In the process of training, the forward propagation result of the bottom network is not recorded in the back propagation. This leads to the need to carry out forward communication again when learning from the reverse JD Zhilian cloud Federation. Two times forward propagation, on the one hand, will increase the running time, on the other hand, if the network contains random numbers, it is likely to produce wrong results. In the JD Zhilian cloud Federation learning platform, theSubclassing API is more flexible, and only one forward propagation is needed in the training process, which can effectively reduce the running time and the instability caused by random numbers.
4. Online prediction
According to different security requirements, it supports SaaS based API interface online prediction and client-side federated real-time prediction. The former is faster, and the latter is more secure.
3、 Scene case
At present, Jingdong Zhilian cloud Federation learning platform has been widely used in many fieldsretail 、 automobile 、 education 、 risk managementAnd other industries. In the automotive industry, after 2 weeks of modeling training, the model effect is significantly improved by 17 % To achieve the dual improvement of customer conversion rate and ROI, and drive the enterprise to realize the full link digital intelligent transformation.
Several offline 4S stores of an automobile brand integrate online and offline data through the federal learning platform, and use machine learning technology to jointly model. The model effectively predicts the people who buy cars in the store and the users’ preference for models. At the same time, it scores each user’s probability of arriving at the store and the model’s preference, and cooperates with SMS and telephone to reach high potential people, which greatly improves the sales efficiency And the conversion rate of different models.
In terms of deployment, JD Zhilian cloud federation can complete the deployment and debugging of the platform in three days and start to use it in a week. At the same time, visual feature analysis is supported, and feature correlation analysis can be realized by selecting and clicking on the page without handwritten code.
- Authoritative report released: JD Zhilian cloud ranks among the outstanding performers of machine learning after its first evaluation
- Know what you think and push what you want
- JCS Big Data Engineer special certification
Welcome to click【Jingdong Zhilian cloud】, learn about the developer community
More wonderful technical practice and exclusive dry goods analysis
Welcome to the official account of Jingdong developer cloud.