Nuwa algorithm, kill crazy!


Today we share a “multimodal” algorithmNÜWA(Nuwa).

Nuwa algorithm, kill crazy!

At the beginning of the paper, the effect is released,NÜWASOTA, which covers 8 classic visual generation tasks.

According to the paper,NÜWAIt also “completely abuse” openai dall-e in text to image generation.

The algorithm effects of various comparisons,Crazy!

N Ü wa effect

Let’s take a look firstNÜWAThe performance of this algorithm in eight classical visual generation tasks.


The task of text to picture is actually to generate a picture corresponding to a text description.

For example:

A dog with gogglesstaring at the camera.

A dog with goggles staring at the camera.

Nuwa algorithm, kill crazy!

There are more effects:

Nuwa algorithm, kill crazy!

NÜWAThe generated effect does not seem so inconsistent. From the effect of the paper, it is very real!

Nuwa algorithm, kill crazy!

The effect is very amazing.

Sketch-To-Image (S2I)

Sketch to picture task is to generate corresponding pictures according to the layout of the sketch.

Nuwa algorithm, kill crazy!

For example:

In a picture, draw a rough outline, you can automatically “brain fill” the picture.

Nuwa algorithm, kill crazy!

This effect is really eye opening. If the real effect is like this paper, it is really strong.

Nuwa algorithm, kill crazy!

This algorithm can be used in many interesting scenes.

Image Completion (I2I)

Image completion, if a picture is incomplete, the algorithm can automatically “brain fill” the incomplete part.

Nuwa algorithm, kill crazy!

good heavens,Are there some bold ideas?

Nuwa algorithm, kill crazy!

This shelter is OK, and there are more detailed ones.

Nuwa algorithm, kill crazy!

If the picture is broken like this, it can “brain fill” the picture. I’m looking forward to the code.

Image Manipulation (TI2I)

Picture processing, processing pictures according to text description.

For example:

There is a picture of grassland, and then add a description:

a horse is running on the grassland

A horse runs on the grassland, and then the corresponding picture can be generated.

Nuwa algorithm, kill crazy!

This amazing understanding.

Nuwa algorithm, kill crazy!

This reminds me of the p-chart, great God, a spoof work.

Nuwa algorithm, kill crazy!

With this algorithm, we can try it, ha ha.


This is not over, except for the above generated imageFour kindseffect,NÜWAYou can also generate video!

Nuwa algorithm, kill crazy!

Corresponding four video generation tasks:

  • Text-To-Video (T2V)
  • Sketch-To-Video (S2V)
  • Sketch-To-Video (S2V)
  • Video Manipulation (TV2V)

You can play both pictures and videos.

Nuwa algorithm, kill crazy!

N Ü wa principle

The overall architecture of n Ü wa model includes an adaptive encoder supporting multiple conditions and a pre trained decoder, which can make the information of image and video at the same time.

For image completion, video prediction, image processing and video processing tasks, part of the input image or video can be directly sent to the decoder.

Nuwa algorithm, kill crazy!

The codecs are based on a 3D nearby self attention mechanism (3dna), which can consider the local characteristics of space and time axis at the same time. The definition is as follows:

Nuwa algorithm, kill crazy!

W represents the learnable weight, and X and C represent the 3D representation of text, image and video data respectively.

3dna considers the complete proximity information and dynamically generates three-dimensional proximity attention blocks for each token. The attention matrix also shows that the attention part (blue) of 3dna is smoother than 3D block sparse attention and 3D axis sparse attention.

Nuwa algorithm, kill crazy!

For more details, you can directly see the paper:

Thesis address:

N Ü wa code

The code of n Ü Wa is not open source, but GitHub has been established.


The author says that open source will soon be available:

Nuwa algorithm, kill crazy!

The company has an open source approval process, and the code has to be sorted out, so you can mark star first, be patient, etc.

Microsoft Asia Research Institute and Peking UniversityA multimodal pre training model n Ü wa jointly created was unveiled at the first Microsoft summit.

This kind of pigeon should not~


This year is a year of vigorous development of multimodal transformer. It can be seen from the papers of various top conferences that various multimodal transformers.

Source: Jack Cui
Open source outpostShare popular, interesting and practical open source projects on a daily basis. Participate in maintaining the open source technology resource library of 100000 + star, including python, Java, C / C + +, go, JS, CSS, node.js, PHP,. Net, etc.

Recommended Today

Heavyweight Tencent cloud open source industry’s first etcd one-stop governance platform kstone

​ Kstone open source At the kubecon China Conference held by CNCF cloud native foundation on December 9, 2021,Tencent cloud container tke team released the open source project of kstone etcd governance platform. KstoneIt was initiated by the TKE team of Tencent cloud containerCloud native one-stop etcd governance project based on kubernetes。 The project originates […]