Deepfake It Till You Make It—Generative Adversarial Networks
In December 2017, a phenomenon took the internet by storm. Uploaded onto Reddit was the face of a renowned celebrity almost seamlessly morphed on to that of a model in a pornographic film. Keen eyes were initially quick to point out flaws, however, with time, the faux clips grew progressively harder to tell apart from reality and frighteningly more genuine in appearance. Concerns brewed over these ‘deepfakes’, and the implications behind the learning mechanism that made it all possible, a generative adversarial network (GAN).
Two Are Better Than One
Learning algorithms are trained to be able to carry out a task by comparing given stimuli to values in a data-set. For instance, an algorithm could learn to identify a shade by having previously analysed a table of colours. Now training gets increasingly complex depending on the task at hand. This requires large sets of data and careful design of parameters in a cost function which evaluates the deviance shown by that stimulus. By making two trained algorithms compete with each other, as in a GAN, the need to design a cost function with intricacy can be circumvented.
Think of a detective and criminal working together and exchanging information on the way they work. The criminal learns of the ways his counterpart trails him and thus makes a greater effort to escape. Similarly, the detective has insight into his adversary’s plans, making him work harder to catch him. At the end of the day, both of their respective abilities are amplified. In a GAN, a generator algorithm creates new data, which a discriminator must label as fake through comparison with a data set of real pictures. Information is sent back and forth between the two networks thereby cyclically improving both of their functionalities. Thus, we get a system that is extremely good at creating realistic data and another one which is great at recognising features and pointing out flaws—or as in the case of deepfakes, patching up any blemishes in a morphed video.
The power of a GAN lies in its ability to augment the performance of its constituent functions. Theoretically, any kind of learning algorithm could be calibrated in its form, making it able to replicate any pattern and fill in gaps of discontinuity. This learning process gets far more intuitive when multiple functions are made to share information competitively. As humans, we learn to play an instrument with practice. We improve our play and pinpoint our flaws better by judging the sound we create against what we hear, see and feel. Thus we learn not just on one competing basis but rather through many which work together. Humans have always been enamoured by the idea of artificial intelligence that could rival us, and GANs give AI the ability to learn through various forms of stimuli and improve based on self-judgment. Conceptually speaking, this is not far from the way we learn and could be the start of turning the science-fiction we have dreamt of into reality.
Seeing is Believing
While that dwells into the future and the realms of fantasy, at the present moment, the world of animation and CGI could benefit greatly from the technology. Scenes that are difficult and dangerous to perform could be entirely generated by computer, be rid of almost all flaws, and made to appear real, thus safeguarding the lives of actors and other personnel while also lowering costs. Advertisements could be tailor-made to appeal to an individual’s preferences while suggestions could be delivered to us through better feature finding algorithms. However, with actors being brought back from the dead in films and the replacement of artists and other personnel in the field being a possibility, GANs’ potent graphical abilities do have their ethical disadvantages.
Since seeing is one of the simplest and strongest forms of believing, a technology that can imitate appearance, sound and body language with ease could be used extensively when it comes to defamation and propaganda. Scores of women have already had their identity and individuality marred simply through visuals. Issues like revenge porn, that is, the non-consensual distribution of sexually explicit content featuring an individual, could be exacerbated with the advent of near-perfect face swapping technology. In a world where false rumours travel quickly online and wreak havoc, forged evidence to back them up could do a whole lot worse. Artificial renditions of influential figures or rumoured events could cause mass hysteria and lead to all kinds of malpractice.
A survey with 267 respondents was held by the Transatlantic Commission on Election Integrity, an organisation committed to combating foreign interference in western elections. The quiz made participants listen to samples of what is dubbed as four of the best human impersonators of Donald Trump’s voice and an AI mimicker of it, which was made to repeat each person’s lines. Over eighty per cent of the respondents felt that the algorithm generated voice was closer to Trump’s than any of the four impersonators. Such studies convey the dangers of how we are unprepared to counter the threat of propaganda spread through such means.A deepfake of Barack Obama’s face morphed onto comedian Jordan Peele’s
Not Just A Pretty Face
Facebook’s DeepFace facial recognition algorithm has become 97% accurate and is able to identify faces at various angles. Over four million images of about four thousand individuals were used to train the algorithm in what was a difficult process, which nonetheless resulted in a highly accurate system. With the advent of GANs, and papers already discussing the use of them for recognition, algorithms could be trained for the same with relative ease and attain accuracy even with motion pictures and pictures with less visibility. The benefits of such technology lie in the detection of infringements and duplication online and will come to enable more honest and responsible distribution of content, as well as privacy protection among users. However, it could be used to collect data on the people one is with and the places one frequents, thus putting one’s privacy at stake with large organisations.
In September earlier this year, a Romanian study on utilising GANs for the recognition of facial emotions was published. It entailed the usage of image generation to improve expression classification and to provide a spectrum of emotional evaluations. The report discussed the potential of such recognition in fields such as marketing where consumer feedback could be assumed through expression. It could also be used in high-security settings to identify individuals displaying signs of fear. Another potential use is the sensing of psychological symptoms like insecurity or anxiety to analyse mental states. While seemingly benevolent, having expressions probed and possibly held accountable could come to encroach upon our freedom and right to intimacy. With countries like China actively investing in surveillance with features such as facial recognition to accurately identify individuals, and possibly read behaviour to predict their movements, the problem is already in the neighbourhood.
Yann LeCun, Chief AI Scientist at Facebook, named GANs the most important breakthrough in deep learning. Learning such as that used by Google’s AlphaGo have already been shown to be able to compete with and defeat professionals at their own board games. With GANs allowing for the augmentation and intercommunication of various kinds of algorithms, boundaries will be pushed even further and come to challenge us even in fields thought to be reserved for human expression. Such innovations will contribute immensely to the field of artificial intelligence. However, as with any new scientific advancement, it may be wise to exercise prudence in our approach.
Featured image credits: Official implementation of StarGAN by Yunjey Choi