Practical combat of neural network GCN code

Time:2021-12-4

GCN code practice

The GCN code in section 5.6 of the book does the classification on the most classic Cora data set. The appropriate and inappropriate analogy of Cora to GNN is equivalent to MNIST to machine learning.

I won’t repeat the introduction of Cora after searching a lot on the Internet. Here is the corresponding data set of CorachartHow is it.

Cora has 2708 papers with 5429 references. Each paper is used as a node, and the reference relationship is the edge between nodes. Each paper has a 1433 dimensional feature to indicate whether a word has appeared in the text, that is, each node has a 1433 dimensional feature. Finally, these papers are divided into seven categories.

Therefore, the purpose of training on Cora is to learn the characteristics of nodes and their relationship with neighbors, and predict the category of unknown nodes according to the known node classification.

It should be OK to know these. Let’s look at the code.

data processing

In the comments, I have written code and quoted pyg. I think it’s enough to glance at it, because there are two GNN wheels (DGL and pyg) in the commonly used data sets. Now they are basically used directly, and they rarely process the original data themselves, so I’ll skip it.

GCN layer definition

Review the definition of GCN layer in Chapter 5:

\[X’=\sigma(\tilde L_{sym}XW)
\]

Therefore, for a layer GCN, it is the input\(X\), multiply by a parameter matrix\(W\), then multiply by a normalized “Laplace matrix”.

Look at the code:

class GraphConvolution(nn.Module):
    def __init__(self, input_dim, output_dim, use_bias=True):
        super(GraphConvolution, self).__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.use_bias = use_bias
        self.weight = nn.Parameter(torch.Tensor(input_dim, output_dim))
        if self.use_bias:
            self.bias = nn.Parameter(torch.Tensor(output_dim))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight)
        if self.use_bias:
            init.zeros_(self.bias)

    def forward(self, adjacency, input_feature):
        support = torch.mm(input_feature, self.weight)
        output = torch.sparse.mm(adjacency, support)
        if self.use_bias:
            output += self.bias
        return output

    def __repr__(self):
        return self.__class__.__name__ + ' (' \
            + str(self.input_dim) + ' -> ' \
            + str(self.output_dim) + ')'

Defines the input and output dimensions and offsets of a GCN layer. For a GCN layer, each layer has its own\(W\)\(X\)It’s input to,\(\tilde L_{sym}\)It is calculated by the data set, so you only need to define oneweightMatrix, just pay attention to the dimension.

When propagating, just follow the formula\(X’=\sigma(\tilde L_{sym}XW)\)Just perform matrix multiplication. Pay attention to a trick:\(\tilde L_{sym}\)It is a sparse matrix, so it is obtained by matrix multiplication first\(XW\), and then calculate by sparse matrix multiplication\(\tilde L_{sym}XW\)Better computational efficiency.

GCN model definition

After knowing the definition of GCN layer, the GCN model can be obtained by stacking GCN layers, and the two-layer GCN can achieve good results (the accuracy of too deep GCN will be reduced due to excessive smoothing):

class GcnNet(nn.Module):
    def __init__(self, input_dim=1433):
        super(GcnNet, self).__init__()
        self.gcn1 = GraphConvolution(input_dim, 16)
        self.gcn2 = GraphConvolution(16, 7)
    
    def forward(self, adjacency, feature):
        h = F.relu(self.gcn1(adjacency, feature))
        logits = self.gcn2(adjacency, h)
        return logits

Here, the hidden layer dimension is set to 16, which can be adjusted to 32, 64,… And there is no big difference in the results of my own trial. From the hidden layer to the output layer, the forecast classification can be obtained by directly setting the output dimension as the classification dimension.

Compared with the propagation of each layer, the propagation only needs to add the activation function, which is selected hereReLU

train

Define model, loss function (cross entropy), optimizer:

model = GcnNet(input_dim).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters(), 
                       lr=LEARNING_RATE, 
                       weight_decay=WEIGHT_DACAY)

The specific training function notes have been clearly explained:

def train():
    loss_history = []
    val_acc_history = []
    model.train()
    train_y = tensor_y[tensor_train_mask]
    for epoch in range(EPOCHS):
        Logits = model (tensor_adjacency, tensor_x) # forward propagation
        train_ mask_ Logits = Logits [tensor_train_mask] # only select training nodes for supervision
        Loss = criterion (train_mask_logits, train_y) # calculate the loss value
        optimizer.zero_grad()
        Loss. Backward() # back propagation calculates the gradient of the parameter
        Optimizer. Step() # uses the optimization method for gradient updating
        train_ acc, _, _ =  Test (tensor_train_mask) # calculates the accuracy on the current model training set
        val_ acc, _, _ =  Test (tensor_val_mask) # calculates the accuracy of the current model on the verification set
        #The changes of loss value and accuracy during training are recorded for drawing
        loss_history.append(loss.item())
        val_acc_history.append(val_acc.item())
        print("Epoch {:03d}: Loss {:.4f}, TrainAcc {:.4}, ValAcc {:.4f}".format(
            epoch, loss.item(), train_acc.item(), val_acc.item()))
    
    return loss_history, val_acc_history

Corresponding test function:

def test(mask):
    model.eval()
    with torch.no_grad():
        logits = model(tensor_adjacency, tensor_x)
        test_mask_logits = logits[mask]
        predict_y = test_mask_logits.max(1)[1]
        accuarcy = torch.eq(predict_y, tensor_y[mask]).float().mean()
    return accuarcy, test_mask_logits.cpu().numpy(), tensor_y[mask].cpu().numpy()

Note that the classification obtained by the model is not one hot, but corresponding to different types of prediction probabilities, sotest_mask_logits.max(1)[1]Take the one with the highest probability as the category of model prediction.

After these are written, run the training function directly. You can do it if necessarytrain_lossandvalidation_accuracyFor drawing, the book also gives the corresponding code, which is relatively simple and will not be repeated.