Recommendation system based on coupling network

Time:2019-12-2

Recommendation system based on coupling network

Author: Chen Dongrui

1. Basic knowledge of complex network

When we pick up our mobile phones to call our family, friends or colleagues, we unconsciously participate in the process of social network formation; when we board the high-speed rail or the plane, we can enjoy the convenience brought by the traffic network; even when we lie in bed and do nothing, the neurons in the brain will form a huge complex network to transmit to each other Signals that help us think or act. Complex network is a theoretical tool that abstracts all kinds of large-scale complex systems in the real world into networks for research. A large number of complex systems in nature can be described by various kinds of networks.
Recommendation system based on coupling network

1.1 representation of network

A typical network is composed of many connecting edges between nodes, in which nodes are used to represent different individuals in the real system, while edges are used to represent the relationship between nodes. Usually, if there is a certain relationship between two nodes, one edge is connected, otherwise there is no connection, and the two nodes connected at the edge are regarded as adjacent in the network. In order to facilitate the calculation, we usually use adjacency matrix to represent the network. According to the different types of network edges, the network can be divided into undirected network, directed network and weighted network. The corresponding adjacency matrix is shown as follows:
Recommendation system based on coupling network

1.2 statistical characteristics of network

  • degree: the number of sides directly connected to the node; in a directed network, it can be divided into outgoing and incoming degrees.
  • convergence factor: the possibility that the neighbors of a node are neighbors to each other, which measures the degree of network clustering

$$C  I = \ frac {the number of edges actually connected between neighbors of this node} {the maximum possible number of connected edges K  I (K  i-1) / 2}$$

We can get the average aggregation coefficient by calculating the arithmetic mean of the aggregation coefficient, and then measure the aggregation degree of the whole network.

  • Shortest path: the shortest connecting path between two nodes
  • Betweenness: the medium number includes the node medium number and the edge medium number.

    • Node medium refers to the proportion of the number of shortest paths passing through the node in the network
    • The number of edges refers to the proportion of the number of shortest paths passing through the edge in the network
    • The medium number reflects the function and influence of the corresponding node or edge in the whole network

1.3 common complex network models

The commonly used network models include regular network, random network, small world network and scale-free network.

  • Regular network

Regular network is the simplest network model. In this type of network, the connection between any two nodes follows the established rules, usually the number of neighbors of each node is the same.
Recommendation system based on coupling network

  • Stochastic network

It is completely random whether there are connected edges between nodes.
Recommendation system based on coupling network

  • Small world network

The small world network model was proposed by Watts and Strogatz in the 1998 paper collective dynamics of small world networks published in nature. They found that the clustering of regular network is high, but the average distance of network is also large, while the average distance of random network is short, and its clustering is also low. The real world network is neither completely regular nor completely random, but between the two, so some scholars introduced the small world network model.
Recommendation system based on coupling network

  • Scale free network

Scale-free network is a network in which most nodes (small nodes) only connect with a few nodes, while a few nodes connect with a lot of nodes (large nodes).
Recommendation system based on coupling network

Background of recommendation system based on complex network

Link prediction refers to how to predict the possibility of connection between two nodes in the network that have not yet generated a connection edge through known network structure and other information. The prediction of existing but not yet discovered connections is actually a data mining process, while the prediction of possible future connections is related to the evolution of the network. Link prediction can be applied to e-commerce websites. If the goods in the e-commerce website are regarded as one kind of node and the user as another kind of node, if user a purchases goods B, a and B will form a connecting edge. This edge only exists between different types of nodes and becomes a binary network, and the chain path prediction problem in the binary network is actually a kind of recommendation system.

Recommendation system based on coupling network
Now let’s briefly introduce an article of recommendation system based on complex network, information filtering via based random walk on coupled social
Network》。 In this paper, the user’s social network and the user commodity binary graph network are coupled, and the user’s social information and commodity preference information are integrated to recommend commodities to users.
Coupling social network (CSN) includes coupling nodes (users), which form leader follower relationship in social network layer and collection relationship in information network layer. The figure below is a simple schematic diagram of a coupled social network, with circles representing users and squares representing objects. The first half is the social network of five users. The $U4 $refers to the connection of $U5 $. The $U4 $is the follower of $U5 $. There is a certain degree of similarity between them. The second half is a binary network. There is a connection between the object $O5 $and the user $U5 $, indicating that the user $U5 $has $O5 $. If there is only the second half of the network, we can not recommend the product $O5 $to the user $U4 $. When we consider the similarity between $U4 $and $U5 $in the social network, we can recommend the object $O5 $to the user $U4 $. Next, let’s explain how this method works in the recommendation system.
 Recommendation system based on coupling network

3. Model introduction

For a recommendation system, we divide it into two parts: user set $u = \ {u  1, u  2,…, u  m \} $and object set $o = \ {o  1, O  2,…, O  n \} $, indicating that there are $M $users and $n $objects. The adjacency matrix $a {m * n} $is defined to represent the network,
$$a_{ialpha} = begin{cases}
1 & the user has collected the object o alpha\
0 & user has no favorite object o alpha
end{cases}$$
The adjacency matrix $B {m * n} $represents the user object bipartite graph,
$$b_{ij} = begin{cases}
1 & user (or object) I has collected object (or user) J\
0 & others
end{cases}$$

3.1 random walk on social network

  • $p {I j} ‘$is the transition probability on social network, from user $u {I $to user $u {J $:

$$P_{ij}’ = begin{cases}
frac{b_{ij}}{k_j^{out}} & if k_j^{out} neq 0 \
0 & otherwise
end{cases}tag{1}$$

  • $s’I ‘(T) $indicates the probability that other users will arrive at user $u’i $at time t,

$$ S_i’ (t+1) = begin{cases}
sum_{j = 1}^{m} frac{b_{ij}}{k_j^{out}} & if k_j^{out} neq 0 \
0 & otherwise
end{cases}tag{2}$$

  • Initial probability

    • For target user $u_i $, $s_i ‘(0) = 1$
    • For other users $u_j $, $s_j ‘(0) = 1$
#Travel on social networks
#Input user ID, probability prob, Lama in the paper, step of walk
user_id = '23298'
prob = 1
step = 3
# def social_network_walk(user_id, prob, step):
user_group = trust_df.groupby(trust_df[0])

#The first step
id_neighbors = list(user_group.get_group(user_id)[1].values)
user_prob_dic = {}
for user_id in id_neighbors:
        user_prob_dic[user_id] = prob/len(id_neighbors)
user_dict =  user_prob_dic.copy()
#After the walk
for _ in range(step-2):
    for item in user_dict.items():
        uesr_id = item[0]
        prob = item[1]
        id_neighbors = list(user_group.get_group(user_id)[1].values)
        for user_id in id_neighbors:
            try:
                user_prob_dic[user_id] += prob/len(id_neighbors)
            except KeyError:
                user_prob_dic[user_id] = prob/len(id_neighbors)
    user_dict =  user_prob_dic.copy()
# return user_dict
user_dict

3.2 random walk on bipartite graph

  • Transfer probability of users to goods:
    $$P_{ialpha}’ = begin{cases}
    frac{a_{ialpha}}{k_i} & if k_i neq 0 \
    0 & otherwise
    end{cases}tag{3}$$
  • Transfer probability of goods to users:
    $$P_{alpha j}” = begin{cases}
    frac{a_{jalpha}}{k_{alpha}} & if
    k_{alpha} neq 0 \
    0 & otherwise
    end{cases}tag{4}$$
  • Define $s {\ alpha} ” (T) $, and $s {J} ” (T) $, as the probability of commodity $\ alpha $, user $t $, in bipartite graph at the time:

$$ S_i’’ (t+1) = begin{cases}
sum_{alpha = 1}^{m} frac{a_{ialpha}}{k_{alpha}} S_{alpha}”(t)& if k_{alpha} neq 0 \
0 & otherwise
end{cases}tag{5}$$

$$ S_{alpha}’’ (t+1) = begin{cases}
sum_{j = 1}^{m} frac{a_{jalpha}}{k_j} S_j”(t) & if k_j neq 0 \
0 & otherwise
end{cases}tag{6}$$
When $t $is an odd number and $t \ geq3 $, $s {\ alpha} “(T) $, indicates the probability that the user $u {\ alpha}” (the user with the initial value set to 1) will select the uncollected product $o {\ alpha} $.

#Wandering on bipartite graph network
#The first step is to travel from users to commodities
# def bipartite_walk(user_id, perct):
user_id = '23298'
prob = 1
Object group = ratings DF. Group by (ratings DF [0]) classifies the objects based on the user's no basis. The same user's favorite objects
User group = ratings DF. Group by (ratings DF [1]) classifies users based on objects, which users collect the same object based on objects
objection_dict = {}
user_dict = {}

#Step 1 user to object
User? Neighbors = object? Group. Get? Group (user? ID) [1]. Values? User's neighbors are objects
for objection_id in user_neighbors:
    print(objection_id)
    objection_dict[objection_id] = prob/len(user_neighbors)
    
#Step 2 object to user
# objection_lis = objection_dict.keys()
for item in objection_dict.items():
    objection_id = item[0]
    prob = item[1]
    objection_neighbors = user_group.get_group(objection_id)[0].values
    for user_id in objection_neighbors:
        try:
            user_dict[user_id] += prob/len(objection_neighbors)
        except KeyError:
            user_dict[user_id] = prob/len(objection_neighbors)
# #         
#Step 3: user to object
for item in user_dict.items():
    user_id = item[0]
    prob = item[1]
    user_neighbor = list(objection_group.get_group(user_id)[1].values)
    for obj_id in user_neighbor:
        try:
            objection_dict[obj_id] += prob/len(user_neighbor)
        except KeyError:
            objection_dict[obj_id] = prob/len(user_neighbor
objection_dict

3.3 biased random walks on coupled social networks

In the coupling network, from users in social networks and commodities in the bipartite graph network to users in the bipartite graph network

$$S_{alpha}(t+1) = begin{cases}
sum_{j=1}^{m} frac{a_{jalpha}}{k_j}S_j(t) & if k_j neq 0 \
0 & otherwise
end{cases}tag{8}$$

Initial probability:

  • $u i $, $s I ” (0) = 1 for target user$
  • For other users $u $and products $\ alpha $, $s_j (\ alpha) = 0 $, $s {\ alpha} = 0$

When $t $is an odd number and $t \ GEQ 3 $, $s {\ alpha} “(T) $represents the probability that the user $u_i $, selects the uncollected product $o {\ alpha} $(set the initial value of the target user as 1 unit resource, when $t = 1 $, the resource will swim from the target user to the adjacent object; when $t = 2 $, the resource will swim from the object to the user again).

#Coupling network walk, user 23298 random walk in coupling network, step size is three, transfer probability is 0.7
user_id = '23298'
prob_social = 0.7
step = 3
prob_bi = 1-prob_social
social_user_group = trust_df.groupby(trust_df[0])
#Classify the objects based on the user's unsubstantiation. The same user's favorite objects
bi_objection_group = ratings_df.groupby(ratings_df[0])  
#Based on the object, classify users, based on the object, which users collect the same object
bi_user_group = ratings_df.groupby(ratings_df[1])   
objection_dict = {}
user_dict = {}

#Step 1 user to object
User? Neighbors = Bi? Object? Group. Get? Group (user? ID) [1]. Values? User's neighbors are objects
for objection_id in user_neighbors:
    objection_dict[objection_id] = prob_social/len(user_neighbors)

for _ in range(int((step-1)/2)) :   
    #Step 2: object to user    
    for item in objection_dict.items():
        objection_id = item[0]
        prob = item[1]
        objection_neighbors = bi_user_group.get_group(objection_id)[0].values
        for user_id in objection_neighbors:
            try:
                user_dict[user_id] += prob/len(objection_neighbors)
            except KeyError:
                user_dict[user_id] = prob/len(objection_neighbors)


    user_neighbors = list(social_user_group.get_group(user_id)[1].values)
    for user_id in user_neighbors:
        try:
            user_dict[user_id] += prob_social/len(user_neighbors)
        except KeyError:
            user_dict[user_id] = prob_social/len(user_neighbors)


    #Step 3: user to object
    for item in user_dict.items():
        user_id = item[0]
        prob = item[1]
        user_neighbor = list(objection_group.get_group(user_id)[1].values)
        for obj_id in user_neighbor:
            try:
                objection_dict[obj_id] += prob/len(user_neighbor)
            except KeyError:
                objection_dict[obj_id] = prob/len(user_neighbor)  
objection_dict

Evaluation index:

  • $precision $, the proportion of the product selected by the user in the recommendation list. $precision ^i= N_{rs}^i/L$

$n {RS} ^ I $: the number of items recommended for user $u {I $: in the test set, $l $: the length of the recommendation list.

  • $recall $, the proportion of recommended items in the user’s favorite list. $recall^i = N_{rs}^i/N_p^i$,
    $n ^ p ^ I $: number of items in the test set that user $u ^ I $
  • $F-measure = \frac{2*Precision \times Recall}{Precision+Recall}$
  • $HD $Hamming distance, a measure of the diversity of user recommendation lists. $HD {I j} = 1 – Q {ij} (L) / L $, $Q {ij} $, is the quantity of the same product in the recommendation list of user $I $and user $J $.
  • $ranking score (R) $: measures users’ satisfaction with the recommendation list, $R {I \ alpha} = l {I \ alpha} / N {I} $. $l {I \ alpha} $, is the user’s location in the user’s $I $recommendations list for items $\ alpha $. $n_i $is the length of the recommended list.

4. Result analysis

4.1 data description

The two public data sets of epicions and FriendFeed contain a social relationship data set and a scoring data set. The following table describes the network corresponding to the dataset. Taking epinations as an example, a part of data is randomly selected for analysis. The sampling data scale is 4066 users, 7649 objects, 154122 user object connections (collections), and 217017 social network connections. The network density is 5×10-3.

Table 1: Properties of the tested data sets
Recommendation system based on coupling network

4.2 effect of parameters $\ lambda $and $t $, on experimental results

$\ lambda $is the allocation proportion of resources in social networks and bipartite graph networks, and $t $is the walk step length. The following figure is measured by sorting points. The colder the value color is, the better the effect is. It can be seen in the figure that when $$t = 3 $$, the better the prediction effect is. When $\ lambda = 0 $, the prediction effect only depends on the bipartite graph network. With the increase of $\ lambda $, the information of social network is gradually considered.

Figure1:Ranking score values on Epinions and Friendfeed data sets (color online).
Recommendation system based on coupling network
        (a)Epinions                                                               (b)Friendfeed

4.3 prediction effect

The following table shows the effect comparison of the model with MD (mass diffusion) model and UCF (user based CF) model when the predicted length is $l = 20 $.

Table2 Algorithmic performance for Epinions data set with recommendation list L =20
Recommendation system based on coupling network

Table3 Algorithmic performance for Friendfeed data set with recommendation list L =20
Recommendation system based on coupling network

This project has been implemented on Mo platform. Remember to fork:https://momodel.cn/explore/5d…

[](https://www.yuque.com/docs/sh…

[](https://www.yuque.com/docs/sh…

Mo(address:https://momodel.cn)Is a python enabledArtificial intelligence online modeling platformTo help you quickly develop, train, and deploy models.


Mo AI ClubIt is a club initiated by the R & D and product design team of the website and committed to reducing the threshold of AI development and use. The team has experience in big data processing and analysis, visualization and data modeling, has undertaken multi field intelligent projects, and has the ability to design and develop the whole line from the bottom to the front. The main research direction is big data management analysis and artificial intelligence technology, which can promote data-driven scientific research.

At present, the club holds offline paper sharing and academic exchange in Hangzhou every two weeks. I hope to gather friends from all walks of life who are interested in AI, constantly communicate and grow together, and promote the democratization and popularization of AI.
Recommendation system based on coupling network