Introduction and implementation of vertical federated learning in technology blog

Time:2021-1-26

Case introduction

A bank a and an Internet company B have reached an enterprise level cooperation. Internet company a and bank B have a large number of overlapping users, and a has characteristic information such as customers’ online behavior. B has the customer’s deposit and loan situation and other characteristic information as well as the customer’s label information – customer’s loan repayment situation (y). B hopes to combine his unique characteristic information with a’s unique characteristic information to train a more powerful model to identify customer credit risk. However, due to the administrative procedures between different industries, user data privacy security and other factors, enterprise a and B cannot directly communicate with each other, so federal learning emerges as the times require.

Overview of federal learning

Definition of federal learning

Federated learning aims to build a federated learning model based on distributed data sets. In the process of model training, the information related to the model can be exchanged between the parties (or in the form of encryption), but the original data can not. This exchange does not expose any protected privacy parts of the data on each site. The trained federal learning model can be placed in each participant of the federal learning system, and can also be shared among multiple parties.
There are n participantsCollaborative use of individual training data setsTo train the machine learning model. The traditional method is to combine all the dataCollected and stored in the same place, for example, stored in a cloud data server, so that the server can use the centralized data set to train a machine learning model. In the training process of traditional methods, any participant will expose their own data to the server or even other participants. Federated learning is a kind of collaborative training model without collecting all the data of each participantThe process of machine learning is very complex.
set upandThey are centralized modelsAnd federated modelPerformance measurement of. When using secure federated learning to build machine learning model on distributed data sources, we allow the performance of Federated learning model to be slightly lower than that of centralized model while protecting user privacy.

amongThis is the allowable performance loss.

Classification of federal learning

According to the different distribution of data used in federated learning among participants, we can divide federated learning into three categories: horizontal federated learning (HFL), vertical federated learning (VFL) and federated transfer learning (FTL). The following are the different data distributions for these three types of Federated learning:

  • Horizontal federated learning: the data of different participants have large overlapping of features (horizontal), but the data samples (vertical), that is, the overlapping degree of the samples to which the features belong is not high. For example, the participants of federal learning are two banks that serve different regional markets. The customer groups they serve are quite different, but the characteristics of customers may overlap due to similar business models.Introduction and implementation of vertical federated learning in technology blog
  • Longitudinal federated learning: the data samples of different participants have large overlap, but the overlap of sample characteristics is not high. For example, two companies (banks and e-commerce companies) provide different services to customers and have different aspects of customer data, but the customer groups they serve overlap greatly.

Introduction and implementation of vertical federated learning in technology blog

  • Federated transfer learning: the data of different participants do not overlap very much in the feature and sample dimensions.

Introduction and implementation of vertical federated learning in technology blog

Longitudinal federated learning algorithm

Vertical federated learning algorithm is conducive to the establishment of cooperation between enterprises, using their own unique data to jointly build a more powerful model. This paper focuses on a vertical federated learning algorithm based on additive homomorphic encryption.

Application scenarios

Refine the case at the beginning. Enterprise B has features X3 and Y (labels), which can be modeled independently. Enterprise a has features X1 and X2, which are lack of Y, so it can’t be modeled independently. Now enterprise a and B cooperate to establish a joint model, obviously the effect will exceed enterprise B’s unilateral data modeling.
Introduction and implementation of vertical federated learning in technology blog
But how can the two parties work together to train a model? Taking logistic regression as an example, the loss function and gradient formula of a classical logistic regression are as follows:


It can be seen that the calculation of gradient is inseparable from feature data (x) and label data (y). Therefore, one of the most direct data interaction directions is that one party sends its own unique data directly to the other party in plaintext, and then the other party calculates the gradient and returns it. However, this kind of interaction will result in information leakage, and one party will get all the information, which is obviously not in line with the specification.
Since the transmission of plaintext is not good, one solution is to send the required data in the form of ciphertext, but this will cause another problem. One party can not decrypt the ciphertext data of the other party, how to calculate it? In this case, we need to introduce homomorphic encryption algorithm.
Introduction and implementation of vertical federated learning in technology blog

Introduction of homomorphic encryption algorithm

Due to the limitation of space, we will only introduce the function of homomorphic encryption algorithm, not its specific details.
Homomorphic encryption is a special encryption method, which allows the ciphertext to be processed and the result is still encrypted. That is to say, the ciphertext can be processed directly and the result is the same as that of the plaintext. From the perspective of abstract algebra, homomorphism is maintained.
Suppose there are two numbers x and Y. op (x, y) represents an operation between X and Y (addition, subtraction, multiplication, division, exponent…). E (x) is the encryption operation for X, D (x) is the decryption operation for X. when an encryption algorithm satisfies homomorphism for an operation OP, the expression is as follows:

or
According to the range and number of operations that the algorithm can support, homomorphic encryption algorithm can be divided into partial homomorphic encryption algorithm (PHE), partial homomorphic encryption algorithm (she) and total homomorphic encryption algorithm (fhe). The range and number of operations that the algorithm supports are expanded in turn. The longitudinal federated learning algorithm after this paper will be implemented based on Paillier algorithm. It is a partial homomorphic encryption algorithm, which supports addition and multiplication with constant. Next, I will demonstrate the function of Paillier algorithm based on Python Phe library.

#Phe library needs to be installed
from phe import paillier
#Generate public key and private key
public_key, private_key = paillier.generate_paillier_keypair()
#Data to be encrypted
secret_number_list = [3.141592653, 300, -4.6e-12]
#Public key encryption
encrypted_number_list = [public_key.encrypt(x) for x in secret_number_list]
#Private key decryption
[private_key.decrypt(x) for x in encrypted_number_list]

Introduction and implementation of vertical federated learning in technology blog
Support addition and subtraction and multiplication and division with constant

a, b, c = encrypted_number_list
a_plus_5 = a + 5                    #= a + 5
print("a + 5 =",private_key.decrypt(a_plus_5))
a_plus_b = a + b                    #= a + b
print("a + b =",private_key.decrypt(a_plus_b))
a_times_3_5 = a * 3.5               #= a * 3.5
print("a * 3.5 =",private_key.decrypt(a_times_3_5))
a_minus_1 = a - 1                 #= a + (-1)
print("a - 1=",private_key.decrypt(a_minus_1))
a_div_minus_3_1 = a / -3.1          #= a * (-1/3.1)
print("a / -3.1 =",private_key.decrypt(a_div_minus_3_1))
a_minus_b = a - b                   #= a + (b*-1)
print("a - b =",private_key.decrypt(a_minus_b))

Introduction and implementation of vertical federated learning in technology blog
If the internal logic of some functions is addition or multiplication with constant, it is also supported.

import numpy as np
enc_mean = np.mean(encrypted_number_list)
enc_dot = np.dot(encrypted_number_list, [2, -400.1, 5318008])
print("enc_mean:", private_key.decrypt(enc_mean))
print("enc_dot:", private_key.decrypt(enc_dot))

Introduction and implementation of vertical federated learning in technology blog

Algorithm flow

The formula of loss and gradient of logistic regression contains exponential operation. Therefore, if we want to use Paillier algorithm for encryption, we need to transform the original formula so that it can only be expressed by addition and multiplication. A common method to transform exponential operation into addition and multiplication is to use Taylor expansion to approximate.
Introduction and implementation of vertical federated learning in technology blog
The upper part of the final transformed gradient matrix is the gradient needed by Party A to update its parameters (including the regular term), and the lower part corresponds to Party B. Our goal is that participants a and B can calculate independently as much as possible, and then obtain their own gradient calculation results through the interaction of encrypted information. Therefore, we need to divide the calculation tasks to a certain extent, and we can use the following design process.
In each round of parameter updating, each participant needs to carry out the following calculation and interaction in order:

  1. Party A and Party B initialize their own parameters, and Party C generates a secret key pair and distributes the public key to Party A and Party B.
  2. Party a calculation, encrypted with public key and sent to B. Party B calculation, encrypted with public key and sent to a.
  3. In this case, a and B can be calculated separatelyas well as([[x]] denotes the homomorphic encryption form of x).
  4. A and B need to send the encrypted gradient to C for decryption, but in order to avoid C getting the gradient information directly, a and B can add a random number to the gradientAndSend it to C. C obtains the encryption gradient, decrypts it and returns it to a and B.
  5. A and B only need to subtract the random number added between them to obtain the real gradient and update its parameters.

Introduction and implementation of vertical federated learning in technology blog

code implementation

Next, we will implement the whole algorithm flow based on Python code. In order to show the flow of the algorithm more clearly, the implementation of the interaction flow will be greatly simplified.

Import required modules

import math
import numpy as np
from phe import paillier
import pandas as pd
from sklearn import datasets
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle

Definition of participants

To set the parent class of a party, each party needs to save the parameters of the model, some intermediate calculation results, and the connection status with other parties.

class Client:
    def __init__(self, config):
        ##Model parameters
        self.config = config
        ##Intermediate calculation results
        self.data = {}
        ##Connection status with other nodes
        self.other_client = {}
    
    ##Connect with other parties
    def connect(self, client_name, target_client):
        self.other_client[client_name] = target_client
    
    ##Sending data to specific parties
    def send_data(self, data, target_client):
        target_client.data.update(data)

Participant a only provides characteristic data during training.

class ClientA(Client):
    def __init__(self, X, config):
        super().__init__(config)
        self.X = X
        self.weights = np.zeros(X.shape[1])
        
    def compute_z_a(self):
        z_a = np.dot(self.X, self.weights)
        return z_a
    
    ##Encryption gradient calculation, corresponding to step 4
    def compute_encrypted_dJ_a(self, encrypted_u):
        encrypted_dJ_a = self.X.T.dot(encrypted_u) + self.config['lambda'] * self.weights
        return encrypted_dJ_a
    
    ##Update of parameters
    def update_weight(self, dJ_a):
        self.weights = self.weights - self.config["lr"] * dJ_a / len(self.X)
        return

    ## A: step2
    def task_1(self, client_B_name):
        dt = self.data
        assert "public_key" in dt.keys(), "Error: 'public_key' from C in step 1 not successfully received."
        public_key = dt['public_key']
        z_a = self.compute_z_a()
        u_a = 0.25 * z_a
        z_a_square = z_a ** 2
        encrypted_u_a = np.asarray([public_key.encrypt(x) for x in u_a])
        encrypted_z_a_square = np.asarray([public_key.encrypt(x) for x in z_a_square])
        dt.update({"encrypted_u_a": encrypted_u_a})
        data_to_B = {"encrypted_u_a": encrypted_u_a, "encrypted_z_a_square": encrypted_z_a_square}
        self.send_data(data_to_B, self.other_client[client_B_name])
    
    ## A: step3、4
    def task_2(self, client_C_name):
        dt = self.data
        assert "encrypted_u_b" in dt.keys(), "Error: 'encrypted_u_b' from B in step 1 not successfully received."
        encrypted_u_b = dt['encrypted_u_b']
        encrypted_u = encrypted_u_b + dt['encrypted_u_a']
        encrypted_dJ_a = self.compute_encrypted_dJ_a(encrypted_u)
        mask = np.random.rand(len(encrypted_dJ_a))
        encrypted_masked_dJ_a = encrypted_dJ_a + mask
        dt.update({"mask": mask})
        data_to_C = {'encrypted_masked_dJ_a': encrypted_masked_dJ_a}
        self.send_data(data_to_C, self.other_client[client_C_name])
       
    ## A: step6
    def task_3(self):
        dt = self.data
        assert "masked_dJ_a" in dt.keys(), "Error: 'masked_dJ_a' from C in step 2 not successfully received."
        masked_dJ_a = dt['masked_dJ_a']
        dJ_a = masked_dJ_a - dt['mask']
        self.update_weight(dJ_a)
        print(f"A weight: {self.weights}")
        return

Participant B provides both feature data and tag data in the training process.

class ClientB(Client):
    def __init__(self, X, y, config):
        super().__init__(config)
        self.X = X
        self.y = y
        self.weights = np.zeros(X.shape[1])
        self.data = {}
        
    def compute_u_b(self):
        z_b = np.dot(self.X, self.weights)
        u_b = 0.25 * z_b - self.y + 0.5
        return z_b, u_b

    def compute_encrypted_dJ_b(self, encrypted_u):
        encrypted_dJ_b = self.X.T.dot(encrypted_u) + self.config['lambda'] * self.weights
        return encrypted_dJ_b

    def update_weight(self, dJ_b):
        self.weights = self.weights - self.config["lr"] * dJ_b / len(self.X)
        
    ## B: step2
    def task_1(self, client_A_name):
        try:
            dt = self.data
            assert "public_key" in dt.keys(), "Error: 'public_key' from C in step 1 not successfully received."
            public_key = dt['public_key']
        except Exception as e:
            print("B step 1 exception: %s" % e)
        try:
            z_b, u_b = self.compute_u_b()
            encrypted_u_b = np.asarray([public_key.encrypt(x) for x in u_b])
            dt.update({"encrypted_u_b": encrypted_u_b})
            dt.update({"z_b": z_b})
        except Exception as e:
            print("Wrong 1 in B: %s" % e)

        data_to_A= {"encrypted_u_b": encrypted_u_b}
        self.send_data(data_to_A, self.other_client[client_A_name])
    
    ## B: step3、4
    def task_2(self,client_C_name):
        try:
            dt = self.data
            assert "encrypted_u_a" in dt.keys(), "Error: 'encrypt_u_a' from A in step 1 not successfully received."
            encrypted_u_a = dt['encrypted_u_a']
            encrypted_u = encrypted_u_a + dt['encrypted_u_b']
            encrypted_dJ_b = self.compute_encrypted_dJ_b(encrypted_u)
            mask = np.random.rand(len(encrypted_dJ_b))
            encrypted_masked_dJ_b = encrypted_dJ_b + mask
            dt.update({"mask": mask})
        except Exception as e:
            print("B step 2 exception: %s" % e)
        try:
            assert "encrypted_z_a_square" in dt.keys(), "Error: 'encrypted_z_a_square' from A in step 1 not successfully received."
            encrypted_z = 4*encrypted_u_a + dt['z_b']
            encrypted_loss = np.sum((0.5-self.y)*encrypted_z + 0.125*dt["encrypted_z_a_square"] + 0.125*dt["z_b"] * (encrypted_z+4*encrypted_u_a))
        except Exception as e:
            print("B step 2 exception: %s" % e)
        data_to_C = {"encrypted_masked_dJ_b": encrypted_masked_dJ_b, "encrypted_loss": encrypted_loss}
        self.send_data(data_to_C, self.other_client[client_C_name])
    
    ## B: step6
    def task_3(self):
        try:
            dt = self.data
            assert "masked_dJ_b" in dt.keys(), "Error: 'masked_dJ_b' from C in step 2 not successfully received."
            masked_dJ_b = dt['masked_dJ_b']
            dJ_b = masked_dJ_b - dt['mask']
            self.update_weight(dJ_b)
        except Exception as e:
            print("A step 3 exception: %s" % e)
        print(f"B weight: {self.weights}")
        return

The main role of participant C in the whole training process is to distribute the secret key, and finally decrypt the A and B encryption gradients.

class ClientC(Client):
    """
    Client C as trusted dealer.
    """
    def __init__(self, A_d_shape, B_d_shape, config):
        super().__init__(config)
        self.A_data_shape = A_d_shape
        self.B_data_shape = B_d_shape
        self.public_key = None
        self.private_key = None
        ##Save the loss value in training (approximate)
        self.loss = []
    
    ## C: step1
    def task_1(self, client_A_name, client_B_name):
        try:
            public_key, private_key = paillier.generate_paillier_keypair()
            self.public_key = public_key
            self.private_key = private_key
        except Exception as e:
            print("C step 1 error 1: %s" % e)

        data_to_AB = {"public_key": public_key}
        self.send_data(data_to_AB, self.other_client[client_A_name])
        self.send_data(data_to_AB, self.other_client[client_B_name])
        return
    
    ## C: step5
    def task_2(self, client_A_name, client_B_name):
        try:
            dt = self.data
            assert "encrypted_masked_dJ_a" in dt.keys() and "encrypted_masked_dJ_b" in dt.keys(), "Error: 'masked_dJ_a' from A or 'masked_dJ_b' from B in step 2 not successfully received."
            encrypted_masked_dJ_a = dt['encrypted_masked_dJ_a']
            encrypted_masked_dJ_b = dt['encrypted_masked_dJ_b']
            masked_dJ_a = np.asarray([self.private_key.decrypt(x) for x in encrypted_masked_dJ_a])
            masked_dJ_b = np.asarray([self.private_key.decrypt(x) for x in encrypted_masked_dJ_b])
        except Exception as e:
            print("C step 2 exception: %s" % e)

        try:
            assert "encrypted_loss" in dt.keys(), "Error: 'encrypted_loss' from B in step 2 not successfully received."
            encrypted_loss = dt['encrypted_loss']
            loss = self.private_key.decrypt(encrypted_loss) / self.A_data_shape[0] + math.log(2)
            print("******loss: ", loss, "******")
            self.loss.append(loss)
        except Exception as e:
            print("C step 2 exception: %s" % e)

        data_to_A = {"masked_dJ_a": masked_dJ_a}
        data_to_B = {"masked_dJ_b": masked_dJ_b}
        self.send_data(data_to_A, self.other_client[client_A_name])
        self.send_data(data_to_B, self.other_client[client_B_name])
        return

Generation of simulation data

Here, a set of simulation data will be generated based on the breast cancer data set in sklearn. Party A obtains part of the feature data, and Party B obtains part of the feature data and tag data.

def load_data():
    #Loading data
    breast = load_breast_cancer()
    #Data splitting
    X_train, X_test, y_train, y_test = train_test_split(breast.data, breast.target, random_state=1)
    #Data standardization
    std = StandardScaler()
    X_train = std.fit_transform(X_train)
    X_test = std.transform(X_test)
    return X_train, y_train, X_test, y_test


##Assign features to a and B
def vertically_partition_data(X, X_test, A_idx, B_idx):
    """
    Vertically partition feature for party A and B
    :param X: train feature
    :param X_test: test feature
    :param A_idx: feature index of party A
    :param B_idx: feature index of party B
    :return: train data for A, B; test data for A, B
    """
    XA = X[:, A_idx]  
    XB = X[:, B_idx]  
    XB = np.c_[np.ones(X.shape[0]), XB]
    XA_test = X_test[:, A_idx]
    XB_test = X_test[:, B_idx]
    XB_test = np.c_[np.ones(XB_test.shape[0]), XB_test]
    return XA, XB, XA_test, XB_test

Realization of training process

def vertical_logistic_regression(X, y, X_test, y_test, config):
    """
    Start the processes of the three clients: A, B and C.
    :param X: features of the training dataset
    :param y: labels of the training dataset
    :param X_test: features of the test dataset
    :param y_test: labels of the test dataset
    :param config: the config dict
    :return: True
    """
    
    ##Get data
    XA, XB, XA_test, XB_test = vertically_partition_data(X, X_test, config['A_idx'], config['B_idx'])
    print('XA:',XA.shape, '   XB:',XB.shape)
    
    ##Initialization of each participant
    client_A = ClientA(XA, config)
    print("Client_A successfully initialized.")
    client_B = ClientB(XB, y, config)
    print("Client_B successfully initialized.")
    client_C =  ClientC(XA.shape, XB.shape, config)
    print("Client_C successfully initialized.")
    
    ##The establishment of the connection between the participants
    client_A.connect("B", client_B)
    client_A.connect("C", client_C)
    client_B.connect("A", client_A)
    client_B.connect("C", client_C)
    client_C.connect("A", client_A)
    client_C.connect("B", client_B)
    
    ##Training
    for i in range(config['n_iter']):
        client_C.task_1("A", "B")
        client_A.task_1("B")
        client_B.task_1("A")
        client_A.task_2("C")
        client_B.task_2("C")
        client_C.task_2("A", "B")
        client_A.task_3()
        client_B.task_3()
    print("All process done.")
    return True



config = {
    'n_iter': 100,
    'lambda': 10,
    'lr': 0.05,
    'A_idx': [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
    'B_idx': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
}

X, y, X_test, y_test = load_data()
vertical_logistic_regression(X, y, X_test, y_test, config)

Introduction and implementation of vertical federated learning in technology blog

training effect

To test the training effect of the longitudinal federated learning algorithm. The common centralized training logistic regression algorithm can be set as the control group. Based on the breast cancer data set, the same training set data and the same logistic regression model can be used for training, and the decline curve of the loss value and the prediction accuracy on the same test set can be observed.
The following is the decline of training loss in two cases:
Introduction and implementation of vertical federated learning in technology blog
The cases represented by each curve are as follows:
Logistic: the normal loss function is used in the loss value curve of ordinary logistic regression
Taylor_ Logistic: the loss value curve of ordinary logistic regression uses the loss function fitted by Taylor expansion
Taylor_ Taylor: the loss value curve of longitudinal logistic regression uses the loss function fitted by Taylor expansion

The following is the difference in the accuracy and AUC of the training results of ordinary logistic regression and longitudinal logistic regression on different data sets in sklearn, where rows represents the number of samples, feat represents the number of features, logistic represents the training results of centralized logistic regression, and vertical represents the training effect of longitudinal federated learning algorithm.
Introduction and implementation of vertical federated learning in technology blog
From the comparison of training results, it can be seen that compared with ordinary logistic regression, the longitudinal logistic regression algorithm can achieve good training effect on the experimental data set while ensuring the data privacy of all parties.

reference

[1] Yang Q , Liu Y , Chen T , et al. Federated Machine Learning: Concept and Applications[J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2):1-19.
[2] Hardy S , Henecka W , Ivey-Law H , et al. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption[J]. 2017.
[3] https://zhuanlan.zhihu.com/p/94105330