Visualizing the results of word2vec using Python


By | mate POCS
Compile VK
Source: towards Data Science

Word2vec is definitely the most interesting concept I have encountered in natural language processing research. Imagine an algorithm that can successfully simulate and understand the meaning of words and their functions in language. It can measure the proximity between words in different topics.

I think it would be interesting to visually represent word2vec vectors: in essence, we can get the vectors of countries or cities, apply principal component analysis to reduce the dimensions, and put them on a two-dimensional chart. Then, we can observe the visualization results.

In this article, we will:

  • The word 2vec theory is discussed in a broad sense;

  • Download the original pre training vector;

  • Look at some interesting applications: such as arithmetic operations on some words, such as the famous King man + woman = queen equation

  • From the word2vec vector, see how accurately we can plot the capital of Europe.

Word2vec’s original research paper and pre training model are from 2013. Considering the expansion speed of NLP literature, it is an old technology at present. Newer methods include glove (which is faster and can be trained on a smaller corpus) and fasttext (which can handle character level n-gram).

Introduction to quick word2vec

One of the core concepts of natural language processing is how to quantify words and expressions so that they can be used in the model environment. This mapping of language elements to numerical representations is called word embedding.

Word2vec is a word embedding process. The concept is relatively simple: by circulating sentence by sentence in the corpus to fit a model, the current word is predicted according to the adjacent words in the predefined window.

For this purpose, it uses a neural network, but in fact, in the end, we do not use the predicted results. Once the model is saved, we only save the weight of the hidden layer. In the original model we will use, there are 300 weights, so each word is represented by a 300 dimensional vector.

Note that two words do not have to be close to each other to be considered similar. If two words never appear in the same sentence, but they are usually surrounded by the same, it is certain that they have similar meanings.

There are two modeling methods in word2vec: skip gram and continuous bag of words. Both methods have their own advantages and sensitivity to some super parameters… But do you know? We will not fit our own model, so I won’t spend time on it.

Of course, the word vector you get depends on the corpus of your training model. Generally speaking, you do need a huge corpus, a trained version of Wikipedia, or news articles from different sources. The results we’re going to use are trained on Google News.

How to download and install

First, you need to download the pre training word2vec vector. You can choose from a variety of models that are trained for different types of documents.

I used the original model. I was trained on Google News. You can download it from many sources. Just search “Google News vectors negative 300”. Alternatively, download here:。

Note that this file is 1.66 GB, but it contains a 300 dimensional representation of 3 billion words.

When it comes to using word2vec in Python, again, you have a lot of packages to choose from, and we will use the gensim library. Suppose the file is saved in word2vec_ In the pre trained folder, you can load it with Python. The code is as follows:

from gensim.models.keyedvectors import KeyedVectors

word_vectors = KeyedVectors.load_word2vec_format(\
    './word2vec_pretrained/GoogleNews-vectors-negative300.bin.gz', \
    binary = True, limit = 1000000)

The limit parameter defines the number of words to import. One million is enough for me.

Explore word2vec

Now that we have the word2vec vector, we can see some interesting uses of it.

First, you can actually check the vector representation of any word:


The result, as we expected, is a 300 dimensional vector, and this vector is difficult to explain. We calculate the new vector by adding and subtracting these vectors, and then calculate the cosine similarity to find the closest matching word.

You can use most_ The similar function finds synonyms, and the topn parameter defines the number of words to list:

word_vectors.most_similar(positive = ['nice'], topn = 5)


[('good', 0.6836092472076416),
 ('lovely', 0.6676311492919922),
 ('neat', 0.6616737246513367),
 ('fantastic', 0.6569241285324097),
 ('wonderful', 0.6561347246170044)]

Now, you may think that in a similar way, you can also find the antonym. You may think that you only need to use the word “nice” as an antonymnegativeInput. But the result is

[('J.Gordon_###-###', 0.38660115003585815),
 ('M.Kenseth_###-###', 0.35581791400909424),
 ('D.Earnhardt_Jr._###-###', 0.34227001667022705),
 ('G.Biffle_###-###', 0.3420777916908264),
 ('HuMax_TAC_TM', 0.3141660690307617)]

These words actually mean the word furthest from the word “nice”.

usedoesnt_matchThe function can find the abnormal words:

['Hitler', 'Churchill', 'Stalin', 'Beethoven'])

returnBeethoven。 I think it’s convenient.

Finally, let’s look at some examples of operations that are famous for giving algorithms a false sense of intelligence. If we want to merge the vectors of father and woman and subtract the vector of man, the code is as follows

positive = ['father', 'woman'], negative = ['man'], topn = 1)

We get:

[('mother', 0.8462507128715515)]

Turn your head around and imagine that we have only two dimensions: parent-child relationship and gender. The word “woman” can be expressed by this vector: [0,1], “man” is [0, – 1], “father” is [1, – 1], and “mother” is [1,1]. Now, if we do the same operation, we get the same result. Of course, the difference is that we have 300 dimensions instead of the only two dimensions in the example, and the meaning of dimensions can hardly be explained.

In the word2vec operation, there is a famous example of gender bias“doctor”The female version of the word used to be counted as“nurse”。 I tried to copy, but I didn’t get the same result:

positive = ['doctor', 'woman'], negative = ['man'], topn = 1)

[('gynecologist', 0.7093892097473145)]

We got a gynecologist, so I think this may be progress?

Well, now that we’ve examined some basic functions, let’s study our visualization!

Map function

First, we need a map function. Suppose we have a list of strings to be visualized and a word embedding, we want to:

  1. Find the word vector representation of each word in the list;

  2. The dimension is reduced to 2 by principal component analysis;

  3. Create a scatter chart with words as labels for each data point;

  4. Another additional advantage is that the results can be “rotated” from any dimension – the vector of principal component analysis is in any direction. When we draw geographical words, we may want to change this direction to see whether it can be consistent with the direction of the real world.

We need the following libraries:

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA

import adjustText

One library not commonly used in the list is adjusttext, which is a very convenient package. It makes it easy to write legends in scatter diagrams without overlapping. It is very difficult for me to find this solution, and as far as I know, there is no way to do this in Matplotlib or Seaborn.

Without further explanation, this function will fully meet our needs:

def plot_2d_representation_of_words(
    flip_x_axis = False,
    flip_y_axis = False,
    label_x_axis = "x",
    label_y_axis = "y", 
    label_label = "city"):
    pca = PCA(n_components = 2)
    for word in word_list: 
        current_row = []
    word_plus_coordinates = pd.DataFrame(word_plus_coordinates)
    coordinates_2d = pca.fit_transform(
    coordinates_2d = pd.DataFrame(
        coordinates_2d, columns=[label_x_axis, label_y_axis])
    coordinates_2d[label_label] = word_plus_coordinates.iloc[:,0]
    if flip_x_axis:
        coordinates_2d[label_x_axis] = \
        coordinates_2d[label_x_axis] * (-1)
    if flip_y_axis:
        coordinates_2d[label_y_axis] = \
        coordinates_2d[label_y_axis] * (-1)
    plt.figure(figsize = (15,10))
        data=coordinates_2d, x=label_x_axis, y=label_y_axis)
    x = coordinates_2d[label_x_axis]
    y = coordinates_2d[label_y_axis]
    label = coordinates_2d[label_label]
    texts = [plt.text(x[i], y[i], label[i]) for i in range(len(x))]

It’s time to test the function. I drew the capitals of European countries. You can use any list, such as the names of presidents or other historical figures, car brands, cooking materials, rock bands, etc., just in word_ Pass it in the list parameter. It’s interesting to see piles of things forming a meaning behind the two axes.

If you want to reproduce the results, here are the cities:

capitals = [
    'Amsterdam', 'Athens', 'Belgrade', 'Berlin', 'Bern', 
    'Bratislava', 'Brussels', 'Bucharest', 'Budapest', 
    'Chisinau', 'Copenhagen','Dublin', 'Helsinki', 'Kiev',
    'Lisbon', 'Ljubljana', 'London', 'Luxembourg','Madrid',
    'Minsk', 'Monaco', 'Moscow', 'Nicosia', 'Nuuk', 'Oslo', 
    'Paris','Podgorica', 'Prague', 'Reykjavik', 'Riga', 
    'Rome', 'San_Marino', 'Sarajevo','Skopje', 'Sofia', 
    'Stockholm', 'Tallinn', 'Tirana', 'Vaduz', 'Valletta',
    'Vatican', 'Vienna', 'Vilnius', 'Warsaw', 'Zagreb']

Suppose you still have the word we created in the previous section_ Vectors object, you can call the function as follows:

    word_list = capitals, 
    word_vectors = word_vectors, 
    flip_y_axis = True)

(flipping the Y axis is to create a more realistic representation of the map.)

The result is:

I don’t know how you feel. When I first saw the map, I couldn’t believe how good the result would be! Yes, of course, the longer you look, the more “mistakes” you will find. A bad result is that Moscow is not as far from the East as it should be… Nevertheless, the East and the West are almost completely separated, Scandinavia and the Baltic countries are well combined, as are the capitals around Italy.

It should be emphasized that this is by no means a purely geographical location. For example, Athens is far from the west, but there is a reason. Let’s review how the above map was exported so that we can fully understand it:

  • A team of Google researchers trained a huge neural network to predict words according to context;

  • They store the weight of each word in a 300 dimensional vector representation;

  • We calculate the vector of European capitals;

  • The dimension is reduced to 2 by principal component analysis;

  • Put the calculated composition on the chart.

Therefore, semantic information cannot represent real geographic information. But I think this attempt is very interesting.

Reference reference

Original link:

Welcome to panchuang AI blog:

Official Chinese document of sklearn machine learning:

Welcome to panchuang blog resources summary station:

Recommended Today

React source code analysis series – render phase of react (II): beginwork

Series article directory (synchronous update) React source code analysis series – render phase of react (I): introduction to the basic process React source code analysis series – render phase of react (II): beginwork This series of articles discuss the source code of react v17.0.0-alpha Next, let’s introduce the “delivery” phase of react render – beginwork. […]