The neural network is taught to “revive” the portraits on the basis of just one static image

Russian specialists from the Center for artificial intelligence, AI Samsung Center-Moscow in cooperation with engineers from the SKOLKOVO Institute of science and technology has developed a system that can create realistic animated images of faces of people based on just a few static shots of a person. Usually in this case requires the use of large databases of images, but in the developers of example, a system trained to produce an animated image of a human face all of the eight still frames, and in some cases was enough and one. More details about the development reported in an article published in the online repository ArXiv.org.

As a rule, to reproduce photorealistic personalized module of the human face is quite difficult due to the high photometric, geometric and kinematic complexity of the human head playback. The reason is not only the complexity of modeling the face as a whole (for this there is a large number of modelling approaches), but also the complexity of modeling specific traits: mouth, hair and so on. Another complicating factor is our disposition to detect even minor flaws in the finished model of the human head. This low tolerance for error simulation explains the current prevalence of non-photorealistic avatars used in teleconferences.

According to the authors, the system, called Fewshot learning, able to create very realistic models of talking heads of people and even portrait paintings. The algorithms produce a synthesis image of the head of one and the same person with the lines of the reference face, taken from another piece of video, or using the benchmarks of the other person’s face. As a source material for training system developers used an extensive database of videos of celebrities. To get the most accurate “talking head”, the system needs to use more than 32 images.

To create more realistic animated images of the faces of the developers used previous developments in generative-adversarial modeling (GAN, where the neural network guesses the details of the image, in effect becoming an artist), and also approach of machine meta-learning, where each element of the system is trained and designed to solve a specific problem.

Scheme of meta-learning

For the processing of static images of people’s heads and turning them into animated used three neural networks: Embedder (network implementation), Generator (network generation) and Discriminator (network discriminator). The first divides an image of the head (with rough facial landmarks) on the vectors of introduction that contain independent of the posture information, the second network uses the received network implementation landmarks of the face and generates them on the basis of new data through a set of convolutional layers that provide resistance to changes of scale, offsets, rotation, change of perspective and other distortions of the original image of the face. And the network discriminator is used to assess the quality and authenticity of the other two networks. As a result, the system transforms the landmarks of a human face into realistic looking personalized photos.

The developers emphasize that their system is able to initialize the parameters of the network generator, and the network discriminator individually for each person in the picture, so the learning process can be founded only on a few images, which increases its speed, although the need for selecting the tens of millions of parameters.

To discuss the news in our Telegram chat.

Leave a Reply

Your email address will not be published. Required fields are marked *