Deepfakes have been very much in the news for the past two years. It’s time to think about what deepfakes are and what they mean. Where do they come from? Why now? Is this just a natural evolution in the history of technology?
Deepfakes are media that are created by AI. They appear to be genuine (e.g., a video of President Obama) but have limited connection to reality. An audio track can be created that sounds indistinguishable from the victim, saying something the victim would never have said. Video can be generated from existing videos or photos that match the soundtrack, so that the mouth moves correctly and the facial expressions look natural. It’s not surprising that humans have trouble detecting fakes; with the current technology, even shallow fakes are too good.
Deepfakes are the logical extension of older AI research. It wasn’t long ago that we read about AI generating new paintings in the style of Rembrandt and other Dutch Masters, stylizing pictures in the style of Van Gogh and Picasso, and so on. At the time, there was more concern about the future of human creativity: would we still need artists? Would we live in a world full of fake Van Goghs? We shrugged those “fakes” off because we were asking the wrong questions. We don’t need more Van Goghs any more than we need more Elvises on velvet. We may end up with a few fake Rembrandts where they shouldn’t be, but the art world will survive.
If that’s the wrong question, what’s the right one? The problem with deepfakes is that simulating an artist’s style collided with the rise of fake news. Fake news isn’t new by any means; there have always been conspiracy theorists who are marvelously skeptical of “traditional” media, but are completely unskeptical of their own sources, whether they claim that Tibetans are spying on us through a system of underground tunnels or that vaccinations cause autism.
To this collision, add three more factors: the democratization of AI, the decrease in the cost of computing power, and the phenomenon of virality. Deepfakes jumped out of the lab and into the streets. You don’t need a Ph.D. to generate fake media, nor do you need the resources of a nation state to acquire enough computing power. Some easily available tools and a credit card to buy time on AWS are all you need. In some cases, it only takes an app: in China, a popular iPhone app lets you put your face into movie clips. (Ironically, backlash against this app didn’t take place because of the fakes but because of the app’s privacy policy.) Once you’ve created a fake, you can use social media to propagate it. YouTube’s and Facebook’s algorithms for optimizing “engagement” can make any content viral in seconds.
That all adds up to a scary picture. We will certainly see deepfakes in politics, though as security expert @thegrugq points out, cheap fakes are better than deepfakes for shaping public opinion. Deepfakes might be more dangerous in computer security, where they can be used to circumvent authentication or perform high-quality phishing attacks. Symantec has reported that it has seen such attacks in the field, and recently an AI-generated voice that mimicked a CEO was used in a major fraud.
Deepfakes for good
The scary story has been covered in many places, and it’s not necessary to repeat it here. What’s more interesting is to realize that deepfakes are just about high quality image generation. “Fakes” are a matter of context; they are specific applications of technologies for synthesizing video and other media. There are many contexts in which synthetic video can be used for good.
Here are a few of these applications. Synthesia creates videos with translations, in which video is altered so that the speaker’s movements match the translation. It provides an easy way to create multilingual public service announcements that feel natural. You don’t have to find and film actors capable of getting your message across in many languages.
One of the biggest expenses in video games is creating compelling video. Landscapes are important, but so are dialog and facial expressions. Synthetic video is useful for creating and animating Anime characters; NVidia has used generative adversarial networks (GANs) to create visuals that can be used in video games.
There are many fields, such as medicine, in which collecting labeled training data is difficult. In one experiment, synthetic MRI images showing brain cancers were created to train neural networks to analyze MRIs. This technique has two advantages. First, cancer diagnoses are relatively rare, so it’s difficult to find enough images; and second, using synthetic images raises few privacy issues, if any. A large set of synthetic cancerous MRIs can be created from a small set of actual MRIs without compromising patient data because the synthetic MRIs don’t match any real person.
Another medical application is creating synthetic voices for people who have lost the ability to speak. Project Revoice can create synthetic voices for ALS patients based on recordings of their own voice, rather than using mechanical-sounding synthetic voices. Remember hearing Stephen Hawking “speak” with his robotic computer-generated voice? That was state-of-the-art technology a few years ago. Revoice could give a patient their own voice back.
Many online shopping sites are designed to make it easier to find clothes that you like and that fit. Deepfake technologies can be used to take images of customers and edit in the clothing they are looking at. The images could even be animated so they can see how an outfit moves as they walk.
Policies and protections
We will see a lot of fakes: some deep, some shallow, some innocuous, some serious. The more important question is what should be done about it. So far, social media companies have done little to detect and alert us to fakes, whether they are deep or shallow. Facebook has admitted that they were slow to detect a fake video of Nancy Pelosi—and that video was an unsophisticated shallow fake. You could argue that any photoshopped picture is a “shallow fake,” and it isn’t hard to find social media “influencers” whose influence depends, in part, on Photoshop. Deepfakes will be even harder to detect. What role should social media companies such as Facebook and YouTube have in detecting and policing fakes?
Social media companies, not users, have the computing resources and the technical expertise needed to detect fakes. For the time being, the best detectors are very hard to fool. And Facebook has just announced the Deepfake Detection Challenge, in partnership with Microsoft and a number of universities and research groups, to “catalyze more research and development” in detecting fakes.
Hany Farid estimates that people working on video synthesis outnumber people working on detection 100:1, but the ratio isn’t the real problem. The future of deepfake fraud will be similar to what we’ve already seen with cybersecurity, which is dominated by “script kiddies” who use tools developed by others, but who can’t generate their own exploits. Regardless of the sophistication of the tools, fakes coming from “fake kiddies” will be easily detectable, just because those tools are used so frequently. Any signatures they leave in the fakes will show up everywhere and be easily caught. That’s how we deal with email spam now: if spam were uncommon, it would be much harder to detect. It also wouldn’t be a problem.
In addition to the “fake kiddies,” there will be a small number of serious researchers who build the tools. They are a bigger concern. However, it’s not clear that they have an economic advantage. Media giants like Facebook and Google have the deep pockets needed to build state-of-the-art detection tools. They have practically unlimited computing resources, an army of researchers, and the ability to pay much more than a crooked advertising agency. The real problem is that media sites make more money from serving fake media than from blocking it; they emphasize convenience and speed over rigorous screening. And, given the number of posts that they screen, even a 0.1% false positive rate is going to create a lot of alerts.
When fake detection tools are deployed, the time needed to detect a fake is important. Fake media does its damage almost instantly. Once a fake video has entered a social network, it will circulate indefinitely. Announcing after the fact that it is a fake does little good, and may even help the fake to propagate. Given the nature of virality, fakes have to be stopped before they’re allowed to circulate. And given the number of videos posted on social media, even with Facebook- or Google-like resources, responding quickly enough to stop a fake from propagating will be very difficult. We haven’t seen any data on the CPU resources required to detect fakes with the current technology, but researchers working on detection tools will need to take speed into account.
In addition to direct fake detection, it should be possible to use metadata to help detect and limit the spread of fakes. Renée DiResta has argued that spam detection techniques could work; and older research into USENET posting patterns has shown that it’s possible to identify the role users take using only metadata from their posts, not the content. While techniques like these won’t be the whole solution, they represent an important possibility: can we identify bad actors by the way they act, not the content they post? If we can, that would be a powerful tool.
Since many fakes take the form of political advertisements, the organizations that run these advertisements must bear some accountability. Facebook is tightening up its requirements for political ads, requiring tax ID numbers and other documentation, along with “paid for” disclaimers. These stricter requirements could still be spoofed, but they are an improvement. Facebook’s new rules go at least part way toward Edward Docx’s three suggestions for regulation:
Nobody should be allowed to advertise on social media during election campaigns unless strongly authenticated–with passports, certificates of company registration, declarations of ultimate beneficial ownership. The source and application of funds needs to be clear and easily visible. All ads should be recorded–as should the search terms used to target people.
The danger is that online advertising is searching for engagement and virality, and it’s much easier to maximize engagement metrics with faked extreme content. Media companies and their customers—the advertisers—must wean themselves from their addiction to the engagement habit. Docx’s suggestions would at least leave an audit trail, so it would be possible to reconstruct who showed which advertisement to whom. They don’t, however, address the bigger technical problem of detecting fakes in real time. We’d add a fourth suggestion: social media companies should not pass any video on to their consumers until it has been tested, even if that delays posting. While Facebook is obviously interested in tightening up authentication requirements, we doubt they will be interested in adding delays in the path between those who post video and their audiences.
Is regulation a solution? Regulation brings its own problems. Regulators may not understand what they’re regulating adequately, leading to ineffective (or even harmful) regulation with easy technical workarounds. Regulators are likely to be unduly influenced by the companies they are regulating, who may suggest rules that sound good but don’t require them to change their practices. Compliance also places a bigger burden on new upstarts who want to compete with established media companies such as Facebook and Google.
Defending against disinformation
What can individuals do against a technology that’s designed to confuse them? It’s an important question, regardless of whether some sort of regulation “saves the day.” It’s entirely too easy to imagine a dystopia where we’re surrounded by so many fakes that it’s impossible to tell what’s real. However, there are some basic steps you can take to become more aware of fakes and to prevent propagating them.
Perhaps most important, never share or “like” content that you haven’t actually read or watched. Too many people pass along links to content they haven’t seen themselves. They’re going entirely by a clickbait title, and those titles are designed to be misleading. It’s also better to watch entire videos rather than short clips; watching the entire video gives context that you’d otherwise miss. It’s very easy to extract misleading video clips from larger pieces without creating a single frame of fake video!
When something goes viral, avoid piling on; virality is almost always harmful. Virality depends on getting thousands of people in a feedback loop of narcissistic self-validation that has almost nothing to do with the content itself.
It’s important to use critical thinking; it’s also important to think critically about all your media, especially media that supports your point of view. Confirmation bias is one of the most subtle and powerful ways of deceiving yourself. Skepticism is necessary, but it has to be applied evenly. It’s useful to compare sources and to rely on well-known facts. For example, if someone shares a video of “Boris Johnson in Thailand in June 2014” with you, you can dismiss the video without watching it because you know Boris was not in Thailand at that time. Strong claims require stronger evidence, and rejecting evidence because you don’t like what it implies is a great way to be taken in by fake media.
While most discussions of deepfakes have focused on social media consumption, they’re perhaps more dangerous in other forms of fraud, such as phishing. Defending yourself against this kind of fraud is not fundamentally difficult: use two factor authentication (2FA). Make sure there are other channels to verify any communication. If you receive voicemail asking you to do something, there should be an independent way to confirm that the message is genuine–perhaps by making a call back to a prearranged number. Don’t do anything simply because a voice tells you to. That voice may not be what you think it is.
If you’re very observant, you can detect fakery in a video itself. Real people blink frequently, every 2 to 10 seconds. Blinks are hard to simulate because synthetic video is usually derived from still photographs, and there are few photographs of people blinking. Therefore, people in fake video may not blink, or they may blink infrequently. There may be slight errors in synchronization between the sound and the video; do the lips match the words? Lighting and shadows may be off in subtle but noticeable ways. There may be other minor but detectable errors: noses that don’t point in quite the right direction, distortions or blurred areas on an image that’s otherwise in focus, and the like. However, blinking, synchronization, and other cues show how quickly deepfakes are evolving. After the problem with blinking was publicized, the next generation of software incorporated the ability to synthesize blinking. That doesn’t mean these cues are useless; we can expect that many garden-variety fakes won’t be using the latest software. But the organizations building detection tools are in an escalating arms race with bad actors on technology’s leading edge.
We don’t expect many people to inspect every video or audio clip they see in such detail. We do expect fakes to get better, we expect both deep and shallow fakes to proliferate, and we expect people to charge genuine video with being faked. After all, with fake news, the real goal isn’t to spread disinformation; it’s to nurture an attitude of suspicion and distrust. If everything is under a cloud of suspicion, the bad actors win.
Therefore, we need to be wary and careful. Skepticism is useful–after all, it’s the basis for science–but denial isn’t skepticism. Some kind of regulation may help social media to come to terms with fakes, but it’s naive to pretend that regulating media will solve the problem. Better tools for detecting fakes will help, but exposing a fake frequently does little to change peoples’ minds, and we expect the ability to generate fakes will at least keep pace with the technology for detecting them. Detection may not be enough; the gap between the time a fake is posted and the time it’s detected may well be enough for disinformation to take hold and go viral.
Above all, though, we need to remember that creating fakes is an application, not a tool. The ability to synthesize video, audio, text, and other information sources can be used for good or ill. The creators of OpenAI’s powerful tool for creating fake texts concluded that “after careful monitoring, they had not yet found any attempts of malicious use but had seen multiple beneficial applications, including in code autocompletion, grammar help, and developing question-answering systems for medical assistance.” Malicious applications are not the whole story. The question is whether we will change our own attitudes toward our information sources and become more informed, rather than less. Will we evolve into users of information who are more careful and aware? The fear is that fakes will evolve faster than we can; the hope is that we’ll grow beyond media that exists only to feed our fears and superstitions.