“Xdeepfm” – abstract

Time:2022-5-22

1. Foreword

Based on the decomposition machine model of DNN, low-order and high-order combined features can be found. On the bit wise level, this paper proposes the compressed interaction network (CIN) network structure. The purpose is to apply the functions of CNN and RNN to CIN on the vector wise level, and further combine CIN and DNN. This network is called xdeepfm.

2. Introduction to the overall development of CTR

The input of CTR generally has the characteristics of high dimension and high sparsity. The traditional method is to use LR for prediction, and the effect is also very good, but it is difficult to improve after reaching the bottleneck. Because LR needs artificial feature engineering, the efficiency will be a little low, and the effect of adding and subtracting features will decline after reaching the bottleneck. Later, FM model was proposed in 2010, which can automatically find second-order combined features, However, the complexity of finding high-order combined features is too large, so FM is only used to find second-order combined features. In 2014, Facebook proposed the fusion scheme of gbdt + LR. In short, because gbdt itself can find a variety of distinguishing features and feature combinations, the path of the decision tree can be directly used as LR input features, eliminating the steps of manually finding features and feature combinations. Therefore, the leaf node output of gbdt can be used as the input of LR. In order to automatically discover combined features and reduce the complexity of artificial feature engineering, network models such as wide & deep, deep & Cross, FNN, AFM, PNN, DIN and deepfm are proposed one after another.

3、xDeepFM

Because the cross network in deep & cross network can not effectively capture the high-order combination characteristics, this paper designs a new network CIN to replace the cross network part in DCN. Therefore, xdeepfm is actually improved on the basis of DCN, and CIN can specify the maximum order. Features interact at the vector wise level rather than the bit wise level.

Xdeepfm structure diagram:

3.1、Embedding Layer

Embedding layer is to reduce the dimension of the original high-dimensional and highly sparse features and transform them from dense vector. For single valued features, for example, gender = [1,0], the feature embedding is used as the field embedding. If it is a multivalued feature, for example, interests = comedy & rock, interests = [0,1,0,0,1,0…], the sum of feature embedding is used as the field embedding。 Finally, combine the embedded vectors.

e = [e_1, 3_2, …, 3_m]

3.2、High-order Interactions

3.2.1. Deepfm + PNN combined network structure:

It can be seen from the figure that this model shares the embedding layer, and this structure contains two ways of combining features. Therefore, the combined features of this model include both vector level and bit level. The product layer multiplies two by two at the vector level and FM layer multiplies at the bit level. The main difference between PNN and deepfm is that PNN connects the output of the product layer to DNN, while deepfm connects the FM layer directly to the output unit.

3.2.2、CrossNet:

expression:

x_k = x_0 x^T_{k-1} w_k +b_k +x_{k-1}

X0 participates in the cross operation of each layer. Crossnet is designed to discover high-order combinatorial features. However, this paper finds that this network can not effectively discover high-order combinatorial features. The certificate is as follows:

Layer I + 1 is defined as follows:

x_{i+1} = x_0x^T_i w_{i+1} +x_i

, output as XK

Assuming k = 1,

\begin{matrix}
x_1 &= x_0 (x_0 ^Tw_1) +x_0 & \\
&= x_0(x^T_0w_1 +1) & \\
& =\alpha^1x^0 &
\end{matrix}

among

\alpha ^1 = x^T_0w_1 +1

In fact, it can be found that X1 and x0 are linear.

Then when k = I + 1, there are:

\begin{matrix}
x_{i+1}& = x_0x_i^T w_{i+1} +x_i\\
&=x_)((\alpha^ix_0)^Tw_{i+1}) +\alpha ^ix_0\\
&= \alpha^{i+1} x_0
\end{matrix}

among

\alpha ^{i+1} = \alpha^i(x^T_0w_{i+1} +1)

, Xi + 1 and x0 are still linear, so the network itself can not effectively find high-order combination features.

3.3、Compressed Interaction Network(CIN)

Structure diagram:

CIN focuses on the following aspects:

Features are combined (multiplied) at the vector level, not at the in place level

You can specify the highest order

It is clear that there is a linear relationship between high-order combination features, rather than the same as crossnet.

X0 is obtained after passing through the embedding layer, which is represented in the form of a graph. The shape and size is m x D, where m is composed of multiple embedded field vectors, and the size of D is the size of field feature. Let the CIN structure have K layers, and the output result of each layer is XK. The result of XK is related to x0 and xk-1, and its calculation formula is:

, which contains Hadamard product (° meaning), i.e. ⟨ A1, A2, A3 ⟩ B1, B2, B3 ⟩ = ⟨ A1, B1, A2, B2, A3, B3 ⟩. Meaning of expression: multiply each line I of xk-1 and each line j of x0, and then multiply by the weight

Corresponding item of$ W^{k,h}_ {ij} $, and finally find the sum to get the value of XK. HK represents the number of embedding vectors of layer K. H0 = M. from this formula, we can know that this operation method is similar to RNN, and the input of the next layer depends on the result of the previous layer. In this paper, CNN is introduced to explain this formula. XK is regarded as an image$ W ^ {K, H} $is regarded as a convolution kernel, which forms the next image in the form of convolution operation.

According to figure (c), it is found that sum pooling is also required for each layer in CIN. The formula is:

, which means that each embedding vector in layer I is summed to get$ p^{k}_ {i}$。 Concatenate the pooling results of layers 1 ~ k, i.e

。 If we directly use CIN for secondary classification, add sigmoid layer, and the calculation formula is:

3.4、Combination with Implicit Networks

According to the xdeepfm structure diagram, the following formula is obtained:

The loss function is logloss:

Regular terms are added to prevent over fitting. Finally, the objective function is:

This work adoptsCC agreement, reprint must indicate the author and the link to this article

article!! Started on my blogStray_Camel(^U^)ノ~YO