Abstract:In this paper, a graph convolution network architecture based on local feature preservation is proposed. Compared with the latest comparison algorithm, the graph classification performance of this method on multiple datasets is greatly improved, and the generalization performance is also improved.

This paper is shared from Huawei cloud community “paper interpretation: graph convolution neural network architecture based on local feature preservation (lpd-gcn)”, original author: PG13.

In recent years, many researchers have developed many methods based on graph convolution network for graph level representation learning and classification applications. However, the current graph convolution network method can not effectively preserve the local information of graph, which is particularly serious for graph classification task, because the goal of graph classification is to distinguish different graph structures according to their learned graph level representation. In order to solve this problem, this paper proposes a graph convolution network architecture based on local feature preservation [1]. Compared with the latest comparison algorithm, the classification performance of the proposed method on multiple datasets is greatly improved, and the generalization performance is also improved.

## 1. Introduction

Graph (Network) structure data can capture rich information between entities by modeling the edges between nodes and connecting nodes. Graph structure data has been widely used in many research fields, including biology (protein-protein interaction network), chemistry (molecular structure / compound structure), Social Science (Social Network / literature citation network) and many other research fields. Graph structured data can not only store structured information efficiently, but also play an important role in modern machine learning tasks. In many machine learning tasks, graph classification is an important task in recent years. The purpose of graph classification is to divide a given graph into specific categories. For example, in order to distinguish various graph structures of organic molecules in chemistry, it is necessary to infer and aggregate the topological structure of the whole graph (in the molecular network, the topological structure is composed of a single atom and its direct bond) and the node characteristics (such as atomic attributes), and use the inferential and aggregated information to predict the category of the graph.

In recent years, many techniques have been published to solve the problem of graph classification. A traditional and popular technique is to design a graph kernel function to calculate the similarity between graphs, and then input it to a kernel function based classifier (such as SVM) for graph classification. Although the method based on graph kernel is effective, there is a computational bottleneck, and the feature selection process is separated from the subsequent classification process. In order to solve the above challenges, the end-to-end graph neural network method has attracted more and more attention. Among them, graph convolution neural network (GCNs) is the most popular graph neural network method to solve the problem of graph classification.

The current graph convolution neural network generally follows the message passing neural network (MPNN) framework [2]. The framework consists of two parts: message passing stage and readout stage. In the message passing stage, the feature vectors of each node are updated by aggregating the neighborhood features of nodes, while in the readout stage, the global pooling module is used to generate the whole graph level features. Graph convolution neural network uses message passing function to iteratively run graph convolution operation, so that feature information can spread for a long distance, so that it can learn different range of neighborhood features. After K times of graph convolution, we can extract useful features of nodes or edges to solve many analysis tasks based on nodes and edges (for example, node classification, link prediction, etc.). In order to solve graph level tasks (such as graph classification), the readout module needs to aggregate the information of all nodes or local structures to generate graph level representation. The following figure shows the general framework of graph convolution neural network for graph classification task. Based on the existing message passing framework, many researchers have developed a variety of graph convolution neural networks with various message passing functions, node updating functions and readout modules.

However, the main limitation of the existing graph convolution neural network method is that the graph convolution neural network method for graph level representation learning lacks effective use of local feature information. In other words, they overemphasize the ability to distinguish different graph structures, but ignore the local expression ability of nodes, which easily leads to the problem of over smoothing (the feature representation of each node tends to be consistent). Especially when the number of layers of neural network is deepened, the problem of over smoothing will become more and more serious. This is because in the process of local neighborhood aggregation, the feature information of the neighborhood is not effectively distinguished and distinguished, which makes the local expression ability of the learned node features not strong. In addition, the influence of over smoothing greatly limits the representation ability of global graph level features.

As we all know, graph level representation is obtained by aggregating the local features of nodes, so how to maintain the local representation ability in the optimization process is the key premise to improve the graph representation ability. Aiming at the goal of graph level representation learning, the existing research methods for maintaining the local representation ability of features can be roughly divided into three factions: (1) designing different graph convolution operations and readout operations; (2) designing hierarchical clustering methods; (3) exploring new model architecture. In the first faction, Xu et al. Found that the graph level representation based on the existing messaging framework can not effectively distinguish different graph structures, and they proposed a graph isomorphic network model (GIN) [3]. Graph isomorphic network uses a single shot aggregation update method to map different neighbors to different eigenvectors. In this way, the local structure and node characteristics of graph can be preserved, and the graph neural network is as effective as weisfeiler Lehman test. Fan et al. Proposed a structured self attention architecture similar to graph attention networks (GATS) [4] for graph level representation learning, in which the node centered attention mechanism aggregates the characteristics of different neighbor nodes with learnable weights, and takes the hierarchical attention mechanism and graph level attention mechanism as the readout module of the model, which can integrate the features of different neighbor nodes from different nodes Important features of different depths are aggregated into the output of the model. In the second faction, that is, in the hierarchical clustering method, many studies have proved that graphs show other rich hierarchical structures besides the dichotomy between nodes or graph level structures. For example, a recent frontier work proposed diffpool [5], which is a differentiable hierarchical pooling method that can be trained jointly with graph convolution, and can be used to extract local feature information.

In a word, the above two methods for graph classification task can well fit most training data sets, but their generalization ability is very limited, and their performance on the test set is mediocre, so it is difficult to break through the bottleneck of existing methods. In the third faction, that is to study the new model architecture, some researchers try to solve the practical difficulties or excessive smoothing problems in training graph convolutional neural network. For example, Xu et al. [6] proposed a jump knowledge network (JK net) architecture to connect the last graph accumulation layer of the network with all the previous hidden layers, which is similar to the residual network. Through this design, the last layer of the model can selectively use the neighborhood information from the previous layers, so that the node level representation can be well captured in a fixed number of graph convolution operations. Especially with the increase of network depth, the effect of residual connection on the model is more prominent. This kind of jump structure has been proved to significantly improve the performance of the model in node related tasks, but few researchers explore their effectiveness in graph level tasks (such as graph classification). In gin, Xu et al. Further proposed a model architecture similar to JK net for learning graph level representation. In this architecture, a read-out layer is connected behind each build-up layer to learn the graph level representations of different depths, and then the graph level representations of different depths are connected together to form the final representation. This readout architecture considers all depth global information, which can effectively improve the generalization ability of the model.

## 2. Graph convolution neural network (GCN)

### (1) Problem definition

Given an undirected graph G = {V, e}, V is the set of nodes and E is the set of edges. In addition, XV is used to represent the initial characteristics of each node. The goal of graph convolution neural network is to learn the continuous representation of arbitrary graph instances to encode node features and topology. Given a group of graphs g = {G1, G2,…, GM} with m labels and the corresponding labels y = {Y1, Y2,…, YM} of each graph, the goal of graph classification is to use them as training data to construct a classifier G θ, The classifier can assign any new graph input g to a specific category YG, that is, YG = G θ( hG)。

### (2) Graph convolution neural network

GCNs consider both the structure information of the graph and the feature information of each node in the graph to learn the node level and / or graph level feature representation that can best help complete the final task. In general, existing GCN variants converge first

Then the neighborhood representation is combined with the central node representation of the previous iteration. In terms of formula, GCN iteratively updates the representation of nodes according to the following formula:

among

It represents the feature representation of node V in the k th iteration. Aggregate() and combine() are the learnable information transfer functions of the kth layer. N (V) represents the set of adjacent nodes of node v. Generally, after K iterations, the final node can be represented

It is applied to node label prediction, or advances to the readout stage of performing graph classification. In the readout stage, the feature vector Hg is calculated for the whole graph by aggregating the features of nodes and using some specific readout functions readout()

Readout() function can be a simple permutation invariant function, such as summation function; It can also be graph level pooling operations, such as diffpool and sortpool.

## 3. Method introduction

In order to solve the problem of insufficient local information retention and generalization ability of existing methods, this paper improves the loss function and model architecture, and proposes the model lpd-gcn. As we all know, GCNs learn the graph level representation of the whole graph by using the topological structure and node characteristics of the graph. From the perspective of loss, in order to make full use of and learn the feature information of nodes, lpd-gcn constructs additional local node feature reconstruction tasks to improve the local representation ability of hidden nodes and enhance the discrimination ability of final graph level representation. In other words, an additional auxiliary constraint is added to preserve the local information of the graph. The node feature reconstruction task is realized by designing a simple but effective encoding decoding mechanism, in which the stacked layers are used as encoders, and then a multi-layer perceptron (MLP) is added for subsequent decoding. In this way, the input node features can be embedded into the hidden representation through the encoder, and then these vector representations can be input into the decoder to reconstruct the initial node features. From the perspective of model architecture, we first explore and design a densely connected graph volume architecture to establish the connection relationship between different layers, so as to make full use of the information from different locations. Specifically, each convolution layer and its corresponding readout module are connected with all previous convolution layers.

### (1) Node feature reconstruction based on encoding decoding mechanism

The graph level representation and discriminant ability of traditional GCN are limited by over refinement and global, ignoring the preservation of local features, which will lead to the problem of over smoothing. Lpd-gcn includes a simple coding decoding mechanism for local feature reconstruction. The encoder is composed of stacked multi graph convolution layers, while the decoder uses multi-layer perceptron to reconstruct local node features. At the same time, an auxiliary local feature reconstruction loss is constructed to assist the goal of graph classification. In this way, node features can be effectively preserved in hidden representations on different layers.

### (2) Neighborhood aggregation based on densenet

In addition, in order to flexibly utilize the information from the neighborhood of different layers, direct connections are added from each hidden convolution layer to all higher convolution layers and readout modules. This architecture is roughly the corresponding structure of densenets. As we all know, densenets is proposed to solve the problem of computer vision. The architecture allows selective aggregation of neighborhood information in different layers, and further improves the information flow between layers. In densenets, hierarchical series feature aggregation is applied. Lpd-gcn adopts the feature aggregation method of layered accumulation.

### (3) Local node representation based on global information perception

After introducing the auxiliary local feature reconstruction module, each convolution layer can accept additional supervision to maintain the locality. However, these global readout modules can not be trained by back propagation of such monitoring information. In the architecture of the model in this chapter, there is a corresponding global readout module behind each build-up layer to embed and collapse the nodes of the whole graph into graph level representation. Then, how to make better use of the supervision information from local feature reconstruction? In order to solve this problem, a direct connection from each readout module to the next convolution module is added, and the node level features are aligned with the global graph level features in series. In other words, each node representation and graph level representation are connected to a single tensor by using point by point concatenation. In addition, a learnable parameter is introduced ε(> 0) to adaptively trade-off between local node level representation and global graph level representation.

among

By designing such an architecture, in addition to the gradient information generated by the loss of the main graph level task, other gradient information can be back propagated to update the read-out parameters due to the loss of local feature reconstruction, so as to reduce the risk of losing the ability of local representation and improve the generalization ability of the model. At the same time, node representation is combined with additional global context to form global context aware local representation, which can also enhance the representativeness of nodes.

### (4) Global hierarchical aggregation based on self attention mechanism

Most of the existing methods are to feed the node representations of multiple graph volumes to the global readout module to generate graph level representations. The readout module generates global graph level features by pooling or summation. However, with the increase of network depth, the node representation may appear too smooth, resulting in poor comprehensive performance of graph level output. In order to effectively extract and utilize the global information of all depths, the model in this chapter further adopts a self attention mechanism to read out the layer by layer graph level features in a way similar to gin. The intuition of introducing layer centered self attention mechanism here is that different attention weights assigned to each layer can be adapted to specific tasks when generating graph level output of specific tasks.

### (5) Loss function

In the training phase, the lpd-gcn model in this chapter receives gradient information from the main task of graph classification and auxiliary local feature reconstruction constraints. In terms of formula, the total loss defined in the following formula (classified by graph) is used

The lpd-gcn is trained by the method of loss weighting and local feature reconstruction.

Which means

Figure classification loss,

The trade-off parameters are introduced adaptively to balance the two loss terms.

## 4. Experimental results of graph classification

### (1) Test data set

This paper uses eight graph datasets commonly used in graph neural network field to evaluate performance by performing 10 times cross validation, and report the mean and standard deviation of test accuracy.

### (2) Effect on test set

The classification performance on multiple datasets has been significantly improved, and the generalization ability has been improved.

## 5. References

[1] WENFENG LIU, MAOGUO GONG, ZEDONG TANG A. K. QIN. Locality Preserving Dense Graph Convolutional Networks with Graph Context-Aware Node Representations. https://arxiv.org/abs/2010.05404

[2] GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for quantum chemistry[C] // Proceedings of the 34th International Conference on Machine Learning : Vol 70. 2017 : 1263 – 1272.

[3] XU K, HU W, LESKOVEC J, et al. How powerful are graph neural networks?[C] // Proceedings of the 7th International Conference on Learning Representations. 2019.

[4] VELI ˇ CKOVI´C P, CUCURULL G, CASANOVA A, et al. Graph attention networks[C] // Proceedings of the 6th International Conference on Learning Representations. 2018.

[5] YING Z, YOU J, MORRIS C, et al. Hierarchical graph representation learning with differentiable pooling[C] // Advances in Neural Information Processing Systems. 2018 : 4800 – 4810.

[6] XU K, LI C, TIAN Y, et al. Representation learning on graphs with jumping knowledge networks[C] // Proceeding of the 35th International Conference on Machine Learning. 2018 : 5449 – 5458.

**Click follow to learn about Huawei’s new cloud technology for the first time~**