I haven’t seen you for a week. I miss you very much. Today, little Mi takes you to learn about recommender systems! Recommendation system is a very important application in machine learning, such as a song that is easy to recommend, a shopping list recommended by a treasure, etc. since it should be so extensive, there is no more nonsense. Let’s start~

1 Definition

Recommendation system is a very interesting problem. It is usually not paid much attention to in academic conferences of machine learning, but it can be seen everywhere in our life.

At the same time, for machine learning, features are very important, and the selected features will have a great impact on the performance of the learning algorithm. Therefore, there is a big idea in machine learning. For some problems, we can automatically learn a set of better features through algorithms, so as to replace manual design. Among them, the recommendation system is an example of type setting.

So what is the recommendation system problem? Let’s start with an example to define the problem of recommendation system.

If a film supplier has five films and four users, we require users to score the films.

The first three films are love films and the last two are action films. It can be found that Alice and Bob seem to be more inclined to love films, while Carol and Dave seem to be more inclined to action films. At the same time, each user does not give too much to all films. At this time, we can build an algorithm to predict how many points each person may give to the movies they have not seen, and use it as the basis for recommendation.

Some related parameters are introduced below:

: number of users;

: number of films;

: if the user reviews the movie too much, then;

: user’s rating of the movie;

: the total number of movies that users over rated.

Algorithm and feature 2

In a content-based recommendation system algorithm, there are some data for the movies that users want to recommend, and these data are more accurately related features.

Suppose that each film has two characteristics, such as the romantic degree of the film and the action degree of the film.

Then each film has an eigenvector, such as the eigenvector of the first film: [0.9 0].

Based on these characteristics to build a recommendation system algorithm, using linear regression model, we can train a linear regression model for each user, such as the model parameters of the first user. So there are:

: user’s parameter vector;

: feature vector of movie;

For users and movies, our predicted score is:;

Cost function

For users, the cost of the linear regression model is the sum of squares of prediction errors plus the regularization term:

Which means that only those movies that users have commented too much are counted. In the general linear regression model, both the error term and the regular term should be multiplied. Here, we choose to remove them without regularizing the difference term.

The above cost function is only for one user. In order to learn from all users, sum the cost functions of all users:

If the gradient descent method is used to solve the optimal solution, the updated formula of gradient descent is obtained after calculating the partial derivative of the cost function:

3 collaborative filtering

In the content-based recommendation system, for each film, we have mastered the available features, and use these features to train the parameters of each user. Draw inferences from one instance. If we have user parameters, we can actually learn the characteristics of the film. Is there any?!

But if there are neither user parameters nor movie features, how should we solve it? Don’t worry, collaborative filtering algorithm is on the stage~

The optimization objectives will be carried out for and at the same time. The results of calculating the partial derivative of the cost function are as follows:

Note: in collaborative filtering algorithm, variance term is usually not used. If necessary, the algorithm will learn automatically. The steps of collaborative filtering algorithm are as follows:

1. The initial is some small random values

2. Use gradient descent algorithm to minimize the cost function

3. After training the algorithm, predict the score given to the movie by the user

The feature matrix obtained through this learning process contains important data about movies, which can be used as the basis for recommending movies to users.

For example, if a user is watching a movie and looking for another movie, according to the distance between the feature vectors of the two movies, the smaller the distance, the more in line with the user’s taste.

4 collaborative filtering algorithm

Collaborative filtering optimization objectives:

Given, estimated:

Given, estimated, simultaneously minimized and:

Vectorization implementation of 5 algorithm

According to the data set of five movies, the movie scores of these users are grouped and coexisted in a matrix.

Since the dataset has five movies and four users, this matrix Y is a matrix with five rows and four columns, and contains the user rating data of these movies:

Launch rating:

Find related movies:

In the early stage, Xiao MI has taken you to learn the feature parameter vector, so how to measure the similarity between the two films is a good solution. For example: a movie has a feature vector, while another movie is different. As long as the distance between the feature vectors of the two movies is small, it can strongly show that the movie and the movie are similar to each other to some extent. At least in a sense, some people like video and may be more likely to be interested in movies. In other words, when users are watching a movie, if they need to find five movies that are very similar to the movie, in order to recommend five new movies to users, what we need to do is to find the movie. Among these different movies, the distance from the movie we are looking for is the smallest, so that you can recommend several different movies to your users.

Through this method, I believe you can know how to carry out a vectorized calculation to score all users and all movies. At the same time, you can also master the method of finding relevant movies and products by learning feature parameters.

6 mean normalization

User rating data:

If you add a new user sky and sky doesn’t rate any movies, what basis do we recommend movies for sky?

First, it is necessary to normalize the mean value of matrix Y, and subtract the average value of all users’ ratings of a movie from each user’s rating of a movie:

Then use this new matrix to train the algorithm. If we want to use the newly trained algorithm to predict the score, we need to add back the average value to predict. For user sky, the new model will think that the score she gives to each movie is the average score of the movie.

Well, that’s all for the learning of recommendation system ~ next week, Xiaomi will arrange large-scale machine learning for you! Let’s see you next week (wave for ten minutes!)