Common i2i algorithm based on CF


I2i (item to item) algorithm based on collaborative filtering is the most widely used recall algorithm in recommendation system. The base model, which is usually used as the start-up stage, is the benchmark of the subsequent algorithm iteration. It has the characteristics of convenient development and fast training speed, and the general effect is not too bad.

If I 2I is not specially emphasized, it mostly refers to CF series algorithm. When CF algorithm is used as i2i recall, it usually does not use the idea of matrix decomposition, but directly calculates through the statistics of co-occurrence index. There are usually three kinds of calculation methods, which are introduced one by one below.

item based CF

Collaborative filtering based on items refers to a method that takes item as the center, calculates the similarity between items by the number of users who have common behavior (generally refers to click, purchase, etc.) between item and item and the behavior statistics of these users. It can be considered as an i2u2i method. With the similarity between item and item, a new item can be recalled to the source user through i2i by using a user’s visit history as a trigger.

The calculation formula is as follows:

$$Sim(I_1,I_2)= \frac{\sum_{u \in I_1^u \cap I_2^u \frac {1}{\log{(1+N_u)}}}}{\sqrt{N_{I_1}N_{I_2}}}$$

In the above formula, $I_ 1 and I_ 2 $is any two items whose similarity is calculated by the statistics of the two items, where $n_ {I_ i} $means article $I_ The total number of hits. The product of denominator can normalize and weaken the influence of hot items. The molecular part is the reciprocal sum of the number of clicks by users who have behavior on both items.

From this formula, it can be seen that the more users who jointly visit the two items, the greater the similarity is, and the less the number of users who jointly visit the two items, namely $n_ The smaller u $is, the greater the similarity of the user’s contribution.

user based CF

The user based CF algorithm is similar to item based CF. It just becomes user-centered, and calculates the similarity between two users by calculating the statistical index of the item jointly accessed by two users. In essence, it calculates the similarity between two users, which can not be used to recommend items directly. However, u2i recommendation can be realized by recalling similar users and recommending the items visited by similar users to the source users.

The formula of user based CF is basically the same as that of item based CF, except that the number of visits to items in the numerator is replaced by the number of user visits, and the numerator is replaced by the statistics of items visited by two users.


Swing algorithm also calculates the similarity of i2i. In theory, it can also calculate u2u, but generally it does not.
Swing is essentially a kind of CF based on graph structure. There are many (1,2,1) relationships in the bipartite graph of user and item, that is, user 1 and 2 have purchased items, which is actually a third-order interaction relationship. The traditional heuristic nearest neighbor method only focuses on the second-order interaction between users and items. Swing will focus on this third-order relationship. One of the intuitions of this method comes from that if multiple users click on 1 and all of them share a certain other 2, then 1 and 2 must be strongly related, and this unknown strong association is equivalent to that transmitted by users. On the other hand, if there are more swing structures between two user pairs, the weaker each structure is, and the lower the weight of each node on the pair. The formula is as follows:
$$Sim(i,j) = \sum\limits_{u \in (U_i \cap U_j)}\sum\limits_{v \in (U_i \cap U_j)} \frac{1}{\alpha+|I_u\cap I_v|}$$
In order to measure the similarity of items I and j, we investigate the users and who have both purchased items and. If the two users purchase fewer items together, the similarity of items and is higher. In extreme cases, two users have purchased a certain item, and only these two items are jointly purchased by the two users, which indicates that the two users’ interests are very different. However, if they purchase these two items at the same time, the similarity between the two items is very large!