How to use keras to build neural network (with all codes)

Time:2020-1-14

Abstract: machine learning practice part: build your own neural network model with simple code ~

Keras is one of the most popular deep learning libraries, which has made great contribution to the commercialization of artificial intelligence. It’s very simple to use, and it allows you to build powerful neural networks in a few lines of code. In this article, you will learn how to build a neural network through keras, and divide user reviews into two categories: positive or negative to predict the emotion of user reviews. This is the so-called emotional analysis of social media. We will use the famous IMDB comment data set to do it. The model we build can be applied to other machine learning problems with only a few changes.

Please note that we will not go into the details of keras or deep learning, which is a good thing for programmers who want to enter the field of artificial intelligence but do not have deep mathematical skills.

Catalog:

1. What is keras?

2. What is emotional analysis?

3. IMDB data set.

4. Import dependency and get data.

5. Explore data.

6. Data preparation.

7. Establish and train the model.

What is keras?

Keras is an open-source Python library that allows you to easily build neural networks. The library can run on tensorflow, Microsoft cognitive toolkit, theano and mxnet. Tensorflow and theano are the most commonly used platforms for building deep learning algorithms in Python, but they can be quite complex and difficult to use. In contrast, keras provides a simple and convenient way to build a deep learning model. Its creator Fran ç OIS Chollet developed it, enabling people to build neural networks as quickly and simply as possible. He focuses on scalability, modularity, minimalism, and python support. Keras can be used with GPU and CPU, and supports Python 2 and python 3. Google keras has made great contributions to deep learning and the commercialization of artificial intelligence, and more and more people are using them.

What is emotional analysis?

With emotional analysis, we want to determine, for example, the speaker’s or writer’s attitude (such as emotion) to a document or event. Therefore, this is a natural language processing problem, which needs to understand the text to predict the potential intention. Emotions are divided into positive, negative and neutral categories. By using sentiment analysis, we hope to predict customers’ opinions and attitudes towards the product based on the comments he wrote. Therefore, sentiment analysis is widely used in comments, surveys, documents and so on.

IMDB data set

The IMDB sentiment classification dataset consists of 50000 movie reviews from IMDB users, marked as positive (1) or negative (0). Comments are preprocessed, each of which is encoded as an integer word index sequence. The words in the comments are indexed according to their overall frequency in the dataset. For example, integer “2” encodes the second most frequent word in data. 50000 reviews were divided into 25000 sessions and 25000 tests. The data set was created by researchers from Stanford University and published in a paper in 2011, with an accuracy of 88.89%. It is also used in the “bag of words meets bags of popcorn” program of the 2011 kaggle competition, and has achieved very good results.

Import dependencies and get data

We first import the required dependencies to preprocess the data and build our model.

%matplotlib inline 
import matplotlib 
import matplotlib.pyplot as plt
import numpy as np 
from keras.utils import to_categorical 
from keras import keras import models 
from keras import layers

We continue to download the IMDB dataset, which is fortunately built into keras. In this way, we don’t need to split his 5 / 5 test, but we will also merge the data into the data and target immediately after downloading, so that we can split 80 / 20 later.

from keras.datasets import imdb
(training_data, training_targets), (testing_data, testing_targets) = imdb.load_data(num_words=10000)
data = np.concatenate((training_data, testing_data), axis=0)
targets = np.concatenate((training_targets, testing_targets), axis=0)

Exploring data

Now we can start exploring datasets:

print("Categories:", np.unique(targets))
print("Number of unique words:", len(np.unique(np.hstack(data))))

Categories: [0 1]
Number of unique words: 9998
length = [len(i) for i in data]
print("Average Review length:", np.mean(length))
print("Standard Deviation:", round(np.std(length)))

Average Review length: 234.75892
Standard Deviation: 173.0

You can see in the output above that the dataset is marked as two categories, representing 0 or 1 respectively, representing the sentiment of the comment. The whole data set contains 9998 unique words, with an average comment length of 234 words and a standard deviation of 173 words.

Now let’s look at a training example:

print("Label:", targets[0])

Label: 1
print(data[0])

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670
, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50
, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 
515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16,
 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 
117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381
, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 
36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345,
 19, 178, 32]

At the top, you’ll see a dataset comment marked positive (1). The following code retrieves the dictionary map word index back to the original words so that we can read them, replacing each unknown word with a. It can do this by using the get? Word? Index() function.

index = imdb.get_word_index()
reverse_index = dict([(value, key) for (key, value) in index.items()]) 
decoded = " ".join( [reverse_index.get(i - 3, "#") for i in data[0]] )
print(decoded) 

# this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could 
just imagine being there robert # is an amazing actor and now the same being director # father came from the same scottish island as
 myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just 
brilliant so much that i bought the film as soon as it was released for # and would recommend it to everyone to watch and the fly 
fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and 
this definitely was also # to the two little boy's that played the # of norman and paul they were just brilliant children are often 
left out of the # list i think because the stars that play them all grown up are such a big profile for the whole film but these 
children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was 
true and was someone's life after all that was shared with us all

Data preparation

Now is the time to prepare our data. We vectorize each comment and fill in zero so that it contains exactly 10000 numbers. This means that we fill in each comment shorter than 10000 with zeros. We do this because most of the comments are about this length, and each input of our neural network needs to be the same size.

def vectorize(sequences, dimension = 10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1
return results
 
data = vectorize(data)
targets = np.array(targets).astype("float32")

Now we divide the data into training and test sets. The training set will contain 40000 comments and 10000 tests.

test_x = data[:10000]
test_y = targets[:10000]
train_x = data[10000:]
train_y = targets[10000:]

Building and training models

Now that we can build our simple neural network, let’s first define the type of model we want to build. There are two types of models available in keras: the sequential model used by the functional API and the model class.

Then we just need to add input layer, hidden layer and output layer. Between them, we use dropout to prevent over fitting. Note that you should always use dropout between 20% and 50%. At each layer, we use “dense layers,” which means the cells are fully connected. In the hidden layer, we use the relu function, because it is always a good start, and in most cases will produce satisfactory results, of course, you can also try other activation functions at will. In the output layer, we use the sigmoid function, which maps the values between 0 and 1. Note that we set the input size to 10000 in the input layer because our comment length is 10000 integers. The input layer needs 10000 inputs and outputs in 50 shapes.

Finally, let’s have keras print a summary of the model we just built.

# Input - Layer
model.add(layers.Dense(50, activation = "relu", input_shape=(10000, )))
# Hidden - Layers
model.add(layers.Dropout(0.3, noise_shape=None, seed=None))
model.add(layers.Dense(50, activation = "relu")
model.add(layers.Dropout(0.2, noise_shape=None, seed=None))
model.add(layers.Dense(50, activation = "relu"))
# Output- Layer
model.add(layers.Dense(1, activation = "sigmoid"))model.summary()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 50)                500050    
_________________________________________________________________
dropout_1 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 50)                2550      
_________________________________________________________________
dropout_2 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 51        
=================================================================
Total params: 505,201
Trainable params: 505,201
Non-trainable params: 0
_________________________________________________________________

Now we need to optimize our model. This is just a configuration training model. We use the “Adam” optimizer. The optimizer is an algorithm that changes weight and bias during training. We also choose binary — cross entropy as the loss (because we deal with binary classification) and accuracy as our evaluation index.

model.compile(
 optimizer = "adam",
 loss = "binary_crossentropy",
 metrics = ["accuracy"]
)

Now we start to train our model. We use batch_size of 500 to do this, and only two epochs, because I realize that if we train it for a long time, the model will over fit. Batch size defines the number of samples that will be propagated through the network. An epoch is an iteration of the whole training data. In general, batch size can speed up training, but it is not always fast convergence. A smaller batch size is slower training, but it can converge faster. It absolutely depends on the nature of the problem, so you need to try some different values. If you encounter problems for the first time, I suggest you use a lot size of 32 first.

results = model.fit(
 train_x, train_y,
 epochs= 2,
 batch_size = 500,
 validation_data = (test_x, test_y)
)
Train on 40000 samples, validate on 10000 samples
Epoch 1/2
40000/40000 [==============================] - 5s 129us/step - loss: 0.4051 - acc: 0.8212 - val_loss: 0.2635 - val_acc: 0.8945
Epoch 2/2
40000/40000 [==============================] - 4s 90us/step - loss: 0.2122 - acc: 0.9190 - val_loss: 0.2598 - val_acc: 0.8950

Now is the time to evaluate our model:

print(np.mean(results.history["val_acc"]))

0.894750000536

That’s great! With this simple model, we have exceeded the accuracy of the 2011 paper I mentioned at the beginning.

You can see the code for the entire model below:

import numpy as np
from keras.utils import to_categorical
from keras import models
from keras import layers
from keras.datasets import imdb
(training_data, training_targets), (testing_data, testing_targets) = imdb.load_data(num_words=10000)
data = np.concatenate((training_data, testing_data), axis=0)
targets = np.concatenate((training_targets, testing_targets), axis=0)
def vectorize(sequences, dimension = 10000):
 results = np.zeros((len(sequences), dimension))
 for i, sequence in enumerate(sequences):
 results[i, sequence] = 1
 return results
 
test_x = data[:10000]
test_y = targets[:10000]
train_x = data[10000:]
train_y = targets[10000:]
model = models.Sequential()
# Input - Layer
model.add(layers.Dense(50, activation = "relu", input_shape=(10000, )))
# Hidden - Layers
model.add(layers.Dropout(0.3, noise_shape=None, seed=None))
model.add(layers.Dense(50, activation = "relu"))
model.add(layers.Dropout(0.2, noise_shape=None, seed=None))
model.add(layers.Dense(50, activation = "relu"))
# Output- Layer
model.add(layers.Dense(1, activation = "sigmoid"))
model.summary()
# compiling the model
model.compile(
 optimizer = "adam",
 loss = "binary_crossentropy",
 metrics = ["accuracy"]
)
results = model.fit(
 train_x, train_y,
 epochs= 2,
 batch_size = 500,
 validation_data = (test_x, test_y)
)
print("Test-Accuracy:", np.mean(results.history["val_acc"]))

summary
In this article, you learned what emotional analysis is and why keras is one of the most commonly used deep learning libraries. Most importantly, you know that keras has made a great contribution to deep learning and the commercialization of artificial intelligence. You’ve learned how to build a simple six layer neural network that can predict the emotions of movie reviews with an accuracy of 89%. Now, you can use this model to do sentiment analysis on other text sources, but you need to change all of them to a length of 10000, or change the input size of the input layer. You can also apply this model to other related machine learning problems with just a few changes.

Article title: how-to-build-a-natural-network-with-keras

By Niklas donges