Python realizes image style conversion through vgg16 model


This paper describes the python image style conversion through vgg16 model. For your reference, the details are as follows:

1. Image style transformation

The activation value of each layer of convolution network can be regarded as a classifier. Multiple classifiers constitute the abstract representation of image in this layer, and the deeper the layers, the more abstract

Content features: the specific elements in the image, and the activation value in a certain layer after the image is input to CNN

Style features: the style of drawing picture elements, the commonness of each content, and the correlation between the activation values of images in a certain layer of CNN network

Style conversion: a new image is generated by adding style features of another image to the content features of one image. In convolution model training, the parameters of the network are adjusted by inputting fixed pictures, so as to train the network by using pictures. In the process of generating a specific style image, the existing network parameters are fixed and the image is adjusted so that the image can be transformed into the target style. In the content style conversion, the pixel value of the image is adjusted to make it close to the content features of the target image output in the convolution network. In the calculation of style features, the Gram matrix is obtained by summing the inner product of the outputs of multiple neurons, and then the loss function of style is obtained by the difference of G matrix.


The total loss function is obtained by multiplying the weight of content loss function and style loss function. The final generated graph has both content and style features

2. It is realized by vgg16

2.1. Pre training model reading

First of all, we need to prepare the model parameters of the vgg16. Link: Extraction code: ejn8

adopt numpy.load () import and view the contents of the parameters:

import numpy as np
# print(data.type())
#View key values of network layer parameters

The printing key values are as follows. You can see that there are different convolution layers and full connection layers

dict_keys([b'conv5_1', b'fc6', b'conv5_3', b'conv5_2', b'fc8', b'fc7', b'conv4_1',
 b'conv4_2', b'conv4_3', b'conv3_3', b'conv3_2', b'conv3_1', b'conv1_1', b'conv1_2', 
b'conv2_2', b'conv2_1'])

Next, check the parameters of each layer, and use the data_ DIC [key] can obtain the parameters of the corresponding level of key, for example, convolution layer 1 can be seen_ The weight W of 1 is three 3 × 3 convolution kernels corresponding to 64 output channels

#View convolution layer 1_ Parameters w, B of 1
print(w.shape,b.shape)   # (3, 3, 3, 64) (64,)
#View the parameters of the full connection layer
print(w.shape,b.shape)   # (4096, 1000) (1000,)

2.2 building VGg network

By filling the trained parameters into the network, the VGg network can be built.

In the class initialization function, read the parameters in the pre training model file to self.data_ DIC

Firstly, the convolution layer is constructed, and the corresponding convolution layer parameters in the model are read and filled into the network through the incoming convolution layer name parameters. For example, read the weight and offset value of the first convolution layer, and pass in name = conv1_ 1, then data_ DIC [name] [0] can get weight, data_ DIC [name] [1] to get bias value bias. adopt tf.constant Construct a constant, then perform convolution operation, add bias, after activation function output.

Next, pool operation is implemented. Since pooling does not require parameters, the input is directly pooled to the maximum and then output

After that, the convolution pooling data is a four-dimensional vector [batch]_ size,image_ width,image_ We need to expand the last three dimensions, multiply the last three dimensions, and pass the tf.reshape () expand

Finally, the results need to be passed through the full connection layer. Its implementation is similar to that of the convolution layer. After reading the weight and bias parameters, the result is output after full connection operation.

class VGGNet:
 def __init__(self, data_dir):
  data = np.load(data_dir, allow_pickle=True, encoding='bytes')
  self.data_dic = data.item()
 def conv_layer(self, x, name):
  #Implementation of convolution operation
  with tf.name_scope(name):
   #Read the parameter values of each convolution layer from the model file
   weight = tf.constant(self.data_dic[name][0], name='conv')
   bias = tf.constant(self.data_dic[name][1], name='bias')
   #Convolution operation
   y = tf.nn.conv2d(x, weight, [1, 1, 1, 1], padding='SAME')
   y = tf.nn.bias_add(y, bias)
   return tf.nn.relu(y)
 def pooling_layer(self, x, name):
  #Implement pooling operation
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
 def flatten_layer(self, x, name):
  #Implement deployment layer
  with tf.name_scope(name):
   # x_shape->[batch_size,image_width,image_height,chanel]
   x_shape = x.get_shape().as_list()
   dimension = 1
   #Calculate the product of the last three dimensions of X
   for d in x_shape[1:]:
    dimension *= d
   output = tf.reshape(x, [-1, dimension])
   return output
 def fc_layer(self, x, name, activation=tf.nn.relu):
  #Implement full connectivity layer
  with tf.name_scope(name):
   #Read the parameter values of all connection layers from the model file
   weight = tf.constant(self.data_dic[name][0], name='fc')
   bias = tf.constant(self.data_dic[name][1], name='bias')
   #Perform full connection operation
   y = tf.matmul(x, weight)
   y = tf.nn.bias_add(y, bias)
   if activation==None:
    return y
    return tf.nn.relu(y)

adopt After data input, the RGB data is divided into three channels: R, G and B, and then the three channels are subtracted by a fixed value. Finally, the three channels are reassembled into a new data in the order of B, G and r.

The next step is to build the VGg network through the above construction function. The parameters of five layers of convolution pooling network, expansion layer and three full connection layers are read into each layer, and the network is built. Finally, it is output by softmax

def build(self,x_rgb):
  #After normalization, the input image is divided into three channels in the fourth dimension
  #The specific values of the three channels were subtracted and normalized, and then put together in BGR order
  VGG_MEAN = [103.939, 116.779, 123.68]
  #Judge whether the spliced data meets the expectation, and then continue to execute
  assert x_bgr.get_shape()[1:]==[668,668,3]
  #Build convolution, pooling, full connection and other layers
  self.conv4_1 = self.conv_layer(self.pool3, b'conv4_1')
  self.conv4_2 = self.conv_layer(self.conv4_1, b'conv4_2')
  self.conv4_3 = self.conv_layer(self.conv4_2, b'conv4_3')
  self.pool4 = self.pooling_layer(self.conv4_3, b'pool4')
  self.conv5_1 = self.conv_layer(self.pool4, b'conv5_1')
  self.conv5_2 = self.conv_layer(self.conv5_1, b'conv5_2')
  self.conv5_3 = self.conv_layer(self.conv5_2, b'conv5_3')
  self.pool5 = self.pooling_layer(self.conv5_3, b'pool5')
  self.fc7 = self.fc_layer(self.fc6, b'fc7')
  self.fc8 = self.fc_layer(self.fc7, b'fc8',activation=None)
  Print ('model construction completed, time% d seconds ~( time.time ()-s_ time))

2.3. Image style conversion

First, you need to defineInput and output of network。 The input of the network is style image and content image. Both images are 668 × 668 three channel pictures. Firstly, the content image style is read through the image object in pil library_ IMG and style image content_ IMG and convert it into an array to define the corresponding placeholder style_ In and content_ Fill in the picture during the training.

The output of the network is three channels of a 668 × 668 image, and an array res of the result image is initialized by a random function_ out。

Use the vggnet class defined above to create the image object and complete the build operation.

vgg16_dir = './data/vgg16_model.npy'
style_img = './data/starry_night.jpg'
content_img = './data/city_night.jpg'
output_dir = './data'
def read_image(img):
 img =
 img_ np =  np.array (IMG) ා convert pictures to [668668,3] array
 img_ np =  np.asarray ([img_ NP],) ා into the array of [1668668,3]
 return img_np
#Input style, content image array
style_img = read_image(style_img)
content_img = read_image(content_img)
#Defines a place holder for the corresponding input image
content_in = tf.placeholder(tf.float32, shape=[1, 668, 668, 3])
style_in = tf.placeholder(tf.float32, shape=[1, 668, 668, 3])
#Initialize the output image
initial_img = tf.truncated_normal((1, 668, 668, 3), mean=127.5, stddev=20)
res_out = tf.Variable(initial_img)
#Building VGg network object
res_net = VGGNet(vgg16_dir)
style_net = VGGNet(vgg16_dir)
content_net = VGGNet(vgg16_dir)

Then we need toDefine loss function

For content loss, the convolution layer of content style image and result image should be the same, for example, convolution layer 1 is selected here_ 1 and 2_ 1。 Then the square difference of the last three channels of the two feature layers is calculated, and then the average value is taken, which is the content loss.

For style loss, we need to get the Gram matrix of the feature layer of the style image and the result image, and then calculate the mean square difference of the Gram matrix.

Finally, the loss can be obtained by adding the two loss functions according to the coefficient ratio

#To calculate the loss, we need to calculate the content loss and style loss respectively
#Extracting content features of content image
content_features = [
 # content_net.conv2_2
#The content features of the same layer are extracted from the corresponding result image
res_content = [
 # res_net.conv2_2
#Calculate content loss
content_loss = tf.zeros(1, tf.float32)
for c, r in zip(content_features, res_content):
 content_loss += tf.reduce_mean((c - r) ** 2, [1, 2, 3])
#Gram matrix for calculating style loss
def gram_matrix(x):
 b, w, h, ch = x.get_shape().as_list()
 features = tf.reshape(x, [b, w * h, ch])
 #Make the inner product of the features matrix and divide it by a constant
 gram = tf.matmul(features, features, adjoint_a=True) / tf.constant(w * h * ch, tf.float32)
 return gram
#Feature extraction of style image
style_features = [
 # style_net.conv1_2
style_gram = [gram_matrix(feature) for feature in style_features]
#Extract the style features of the corresponding layer of the result image
res_features = [
res_gram = [gram_matrix(feature) for feature in res_features]
#Computational style loss
style_loss = tf.zeros(1, tf.float32)
for s, r in zip(style_gram, res_gram):
 style_loss += tf.reduce_mean((s - r) ** 2, [1, 2])
#Coefficient of model content and style characteristics
k_content = 0.1
k_style = 500
#Add the two loss values by a factor
loss = k_content * content_loss + k_style * style_loss

Next, start 100 rounds of training, print and view the total loss, content loss and style loss values in the process. And output the image of each round to the specified directory

learning_steps = 100
learning_rate = 10
train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)
with tf.Session() as sess:
 for i in range(learning_steps):
  t_loss, c_loss, s_loss, _ =
   [loss, content_loss, style_loss, train_op],
   feed_dict={content_in: content_img, style_in: style_img}
  Print ('Round% D, total loss:%. 4f, content loss:%. 4f, style loss:%. 4f '
    % (i + 1, t_loss[0], c_loss[0], s_loss[0]))
  #Get the result image array and save it
  res_arr = res_out.eval(sess)[0]
  res_ arr =  np.clip (res_ Arr, 0, 255) ා crop the values in the result array to 0 ~ 255
  res_ arr =  np.asarray (res_ arr,  np.uint8 )ා convert image array to uint8
  img_path = os.path.join(output_dir, 'res_%d.jpg' % (i + 1))
  #Image array converted to picture
  res_img = Image.fromarray(res_arr)

The running results are as follows: content picture, style picture, 12 round, 46 round and 100 round result picture



More interested readers about Python related content can view the special topics of this website: Python data structure and algorithm tutorial, python encryption and decryption algorithm and skills summary, python coding operation skills summary, python function use skills summary, python character string operation skills summary and python introduction and advanced classic tutorial

I hope this article will be helpful to python programming.

Recommended Today

Comparison and analysis of Py = > redis and python operation redis syntax

preface R: For redis cli P: Redis for Python get ready pip install redis pool = redis.ConnectionPool(host=’′, port=6379, db=1) redis = redis.Redis(connection_pool=pool) Redis. All commands I have omitted all the following commands. If there are conflicts with Python built-in functions, I will add redis Global command Dbsize (number of returned keys) R: dbsize P: print(redis.dbsize()) […]