The dcgan network introduced in this paper is based on GaN network and introduces image convolution operation, that is, it combines CNN and Gan network well, so that the network generator can finally generate a false image from random noise. The author of the paperAlec Radford & Luke Metz。 In this paper, the author does not elaborate the concrete principle of dcgan network, but directly presents the network architecture designed by the author and the details of some parameters adjustment. Relevant abstracts, introductions and related work can be viewed directly from the original paper
Method and model architecture
Specifically, the generator g and discriminator D in Gan original paper are replaced by two CNN networks. For these two CNN networks, some adjustments have been made.
First of all, a full convolution network is used to replace the pooling layer with step size convolution (i.e. convolution with step size greater than 1). The purpose is to hope that the network can learn the way of down sampling by itself, which is more flexible than the fixed pooling layer. This method is used in both the generator and the discriminator (in the generator, it is mainly to use the transfer convolution for up sampling).
The second is that the trend at that time was to cancel the full connection layer. The most recent method is to use the global average pooling layer instead of the full connection layer. However, this improves the stability but reduces the convergence speed. For the generator, the input of Gan is one-dimensional noise initialized by uniform distribution, and then a fully connected layer is used. The resulting reshape is a 4D tensor, and then convolution can be performed layer by layer. For the discriminator, the final convolution layer is flatten first and then sent to sigmoid classifier for output.
Thirdly, the batch normalization layer is added to each layer, which helps to stabilize and converge the model and prevent over fitting. However, experiments show that using BN layer for all layers of the network will make the sample unstable, so only BN layer is added to the output layer of the generator and the input layer of the discriminator.
In the generator, except tanh for output layer, all other layers use relu. For discriminators, leakyrelu is better.
To sum up, some improvements of the network are as follows:
- All pooling layers are replaced by step convolution (discriminator) and transposed step convolution (generator).
- BN layer is added to the generator and discriminator.
- For deep network architecture, remove the full connection layer.
- In addition to the output layer of the generator, relu is used for other layers, and tanh is used for output layer.
- All of them used leaky relu for discriminators.
Next, the author mentioned some parameter adjustments used in his training model.
- First of all, the image is not preprocessed, but its output is mapped in the interval of [- 1,1] by tanh.
- Using mini batch SGD, the batch size is 128.
- All parameters are initialized with 0 mean value and standard deviation of 0.02.
- The slope of leakyrelu is set to 0.2.
- Using Adam optimizer, because the recommended learning rate of 0.001 is too large, the author modified to use 0.0002. In addition, the value of is adjusted from the default value of 0.9 to 0.5 to ensure the stability of the model.
The following figure shows the architecture of the generator.The first layer is a uniform distribution of one-dimensional noise, and then after a layer mapping into a feature map. The up sampling process uses transpose convolution, and the final output is the size of the image generated by this model, and then it will be handed over to the discriminator for discrimination.
There are many ready-made implementations of dcgan on GitHub. Here we use the implementation of dcgan in the tutorial of tensorflow official website. In the specific application, it can be found that many parameters and network structure settings can be adjusted according to the actual experimental needs. For example, in the tutorial on tensorflow’s official website, some adjustments are made to the activation functions in the generator and discriminator, and the full connection layer and dropout layer are used.
Here is the code architecture of the generator:
def make_generator_model(): model = tf.keras.Sequential() model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,))) model.add(layers.BatchNormalization()) model.add(layers.LeakyReLU()) model.add(layers.Reshape((7, 7, 256))) assert model.output_shape == (None, 7, 7, 256) model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding ='same', use_bias=False)) assert model.output_shape == (None, 7, 7, 128) model.add(layers.BatchNormalization()) model.add(layers.LeakyReLU()) model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False)) assert model.output_shape == (None, 14, 14, 64) model.add(layers.BatchNormalization()) model.add(layers.LeakyReLU()) model.add(layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh')) assert model.output_shape == (None, 28, 28, 1) return model
def make_discriminator_model(): model = tf.keras.Sequential() model.add(layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same', input_shape=[28, 28, 1])) model.add(layers.LeakyReLU()) model.add(layers.Dropout(0.3)) model.add(layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same')) model.add(layers.LeakyReLU()) model.add(layers.Dropout(0.3)) model.add(layers.Flatten()) model.add(layers.Dense(1)) return model
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
def discriminator_loss(real_output, fake_output): real_loss = cross_entropy(tf.ones_like(real_output), real_output) fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output) total_loss = real_loss + fake_loss return total_loss def generator_loss(fake_output): return cross_entropy(tf.ones_like(fake_output), fake_output)
In this paper, the combination of CNN and Gan is given and improved. In practical use, it can also be adjusted according to the actual needs to achieve the optimal situation.
In the actual experiment, when deconvolution is used, when the step size can not divide the convolution kernel size, the image sampled from the upper sample will easily appear the chessboard effect. The intuitive reason of the chessboard effect can be referred to here. It should be said that transposed convolution will inevitably bring chessboard effect. Even if the step size can divide the convolution kernel size, it may also be caused by the uneven weight learning Chessboard effect. Therefore, the recommended method is to use interpolation to do up sampling and then the same convolution can achieve the same effect without chessboard effect.
To access the avatar dataset, click to view this blog
The specific tutorial can go to tensorflow official website for reference. It implements a network to automatically generate handwritten data sets. Based on this framework, changes can also be made to generate different images of other sizes. After I have modified the network structure and trained on the animation character avatar data set, the network can generate animation character portraits. After about 200 steps of training, we can see that there are still many defects.
The definition of loss function is also based on the original Gan network
Dropout layer is added.
The code of discriminator is:
We can see that the first layer uses full connection, and the middle layer does not use the relu mentioned in the original paper, but also uses leaky relu.