
Deepfake videos appeared first in 2017 on Reddit when celebrities’ faces were used for porn. In 2018, a Deepfake of Obama insulting Trump was made. In 2021, a Deepfake video of Tom Cruise was made on TikTok. However, they have been in the news in India only recently; after a small-time woman actor’s somewhat indecent video appeared on social media in the first week of November. In mid-November, PM Modi called Deepfakes one of the biggest threats facing the nation. He revealed, “I saw a video in which I was doing Garba. It seemed very real”. Hamas created Deepfake images of the war to garner undue sympathy. Software for creating Deepfake images, video, and audio is already freely available online.
Mischiefs apart, Deepfakes have extremely serious implications for the dispensation of criminal justice because it will be almost impossible for courts to prove that a video presented before them as genuine is fake or, conversely, a video alleged to be fake is genuine.
Basics of Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, as we know, is the ability of a computer to perform tasks commonly associated with intelligent beings. ‘Intelligent behaviour’ is what distinguishes higher life forms from ‘instinctual behaviour’ of lower life forms. Intelligent behaviour has many diverse components but, for computers, AI focuses mainly on the following components of intelligence: learning, reasoning, problem-solving, perception, and using language.
The evolution of AI owes much to advancements in Machine Learning (ML). In Machine Learning, computers learn how to perform day-to-day operations like humans, but without being instructed. Machine Learning makes the computer competent to self-reform the generated outcomes by learning from previous errors and discrepancies. For example, a simple computer program for solving mate-in-one chess problems will try moves at random until a mate is found. The program will then store the solution with the position so that the next time the computer encounters the same position it will recall the solution. This simple memorizing of individual items and procedures comes naturally to a computer.

AI operates on the logical structure of ML algorithms, specifically Neural Networks (aka Artificial Neural Networks), which emulate human intelligence. A neural network is a computer program that operates in a manner inspired by the natural neural network in the brain. ANNs train machines akin to human brain neural networks, enabling independent intelligent decision-making.
Training neural networks typically involve supervised learning, where each training example contains the values of both the input data and the desired output. As soon as the network can perform sufficiently well on additional test cases, it can be applied to the new cases. Unsupervised learning is also possible.
Also Read: Israel-Hamas War Cannot be Bloodless
‘Deep learning’ is a sub-domain of Machine Learning (ML). Machine Learning usually involves five steps: collecting the historical data; preparing the data; building the ML model; training and testing the model; and deploying the model for practical applications.
Though Deep Learning is the principal part of ML, its core program differs from it. It involves the following five steps: defining the suitable network architecture type; configuring the model for training; fitting the model on the given training dataset; estimating the model’s precision; and setting up the model in a real-time environment. Deep Learning systems are based on multilayer neural networks that extract features, learn, and make their own human-like decisions. For example, the speech recognition capability of Apple’s mobile assistant Siri is powered by Deep Learning systems.

How Are Deepfakes Created?
The word Deepfake is a portmanteau of ‘Deep Learning’ and ‘fake’. Deepfake is a type of Artificial Intelligence (AI) used to create images/videos/audios that do not require photographing/video recording the ‘morphed’ persons but are very convincingly made to appear lifelike to be those very persons.
Deepfake uses a combination of two algorithms, a Generator and a Discriminator. This is called a Generative Adversarial Network (GAN) algorithm. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent’s gain is another agent’s loss. This is used to create and then refine fake content. It works this way.
The Generator builds a training data set based on the desired output, creating an initial fake digital content. The Discriminator then analyses how realistic or, conversely, how fake or poor the initial version of the content is. This process is repeated several times. Through repetitions, the Generator improves at creating realistic content and the Discriminator is enabled to become more skilled at spotting flaws for the Generator to correct.

When creating a Deepfake photograph, a GAN system views photographs of the target from an array of angles to capture all the details and perspectives. When creating a Deepfake video, the GAN views the video from various angles and also analyses behaviour, movement and speech patterns. This information is then run through the Discriminator multiple times to fine-tune the realism of the final image or video.

Deepfake videos can be created in two ways. They can use an original video source of the target, where the person is made to say and do things they never did; or they can swap the person’s face onto a video of another individual, called face swap.
When working from a source video, a neural network-based Deepfake auto-encoder analyses the content to understand relevant attributes of the target, such as facial expressions and body language. It then imposes these characteristics onto the original video. This auto-encoder includes an encoder, which encodes the relevant attributes; and a decoder, which imposes these attributes onto the target video.
Existing and Possible Deepfake Detection Technologies

Popular media would tell you that Deepfakes can be detected by pointing out certain inconsistencies in them. They include inconsistencies in blinking and facial expressions; lip sync issues; unnatural skin or lighting; inconsistent or warped backgrounds; audio-visual mismatch; abnormal body proportions or movements, etc. Real life, however, is not so easy.
First, such ‘defects’ are found usually in low-level Deepfakes and not in high-level ones. Moreover, all such childish things lack the scientific rigour expected of a ‘proven science’. Determination of uniqueness requires measurements of object attributes, data collected on the population frequency of variation in these attributes, testing of attribute independence, and calculations of the probability that different objects share a common set of observable attributes.

Commercial Deepfake detection software is currently not available and experts have to be called in to use their techniques. A 2019 Deepfake Detection Challenge with a $1 million award met with only partial success. William Corvey of the renowned Defense Advanced Research Projects Agency (DARPA), admits that their program MediFor (Media Forensics) has not been able to develop end-to-end technology sufficient to perform a complete forensic analysis on Deepfake videos.
Yuezun Li and Siwei Lyu have shown that the very method a program would use to detect a Deepfake can be used to “train” new Deepfake creation algorithms. In his paper titled “Deepfake Detection Algorithms Will Never Be Enough,” James Vincent has shown that when people developed an algorithm to track inconsistent eye blinking in Deepfakes (a common error with low-level Deepfakes), new algorithms to include more normal eye blinking were developed and the detection method was rendered obsolete.
One of the possible technologies could be a digital watermarking system. That would mean that smartphones or other devices that are normally used to capture audio/video should automatically incorporate their metadata watermark or stamp indicating when and where the content was recorded. But only future devices can do it whereas billions of older devices would continue to be in use.
Problems Which the Courts Will Face

Nobody in India can foresee the extremely serious problems our criminal courts are going to face on account of deepfakes. Presently, excited by fake indecent or porn videos, India seems to be happy only preventing the proliferation of deepfakes.
The government has advised victims to file FIR, while social media companies such as Instagram, X and Facebook have been instructed to remove such content from their platforms within 24 hours of receiving a complaint, or risk being censured under the provisions of the IT Rules. Presently, the government is treating it as cheating by personation by using computer resources, punishable under Section 66 D of IT Act, 2000 carrying three years in jail and a fine up to Rs. 1 lakh. Individuals can go for defamation.
The use of Deepfakes to create pornographic or defamatory videos is essentially a prank, but, they could, however, figure in very serious cases and it will present insurmountable problems for the courts.

Hitherto, images and videos are admissible in evidence under the ‘silent witness’ theory. In American law, since the 2001 judgment in United States v. Harris, the ‘silent witness’ theory allows authentication of photographs by demonstrating the reliability of the process that created them, without the need of a human witness to the events shown by the film. This theory would fall apart with Deepfakes.
For example, there could be a terrorist attack at a public place or an important government facility. The attack could be recorded by one of the bystanders on his mobile phone or captured in the CCTV cameras and flashed instantly on social media. When they are produced in court, the defence could argue that they are Deepfakes created by the prosecution to bolster their case. That is where the problem manifests itself in all its complexity. How does the prosecution prove that it is not Deepfake and that the person seen in the video is none other than the accused in the dock?

Riana Pfefferkorn, associate director of surveillance and cybersecurity at the Stanford Center for Internet and Society categorically states, “AI is now capable of generating fake human faces, which I for one cannot detect any tell-tale signs that it’s not a real photo.” It has been convincingly shown by the University of Washington’s WhichFaceIsReal.com that even for rudimentary Deepfakes, nearly half the people cannot distinguish a Deepfake video from a genuine one. The same thing holds good for judges also. Highly specialized and rarely found technical experts will have to be called in and since authentication is not an exact science, even their testimony will be questioned despite their professional reputation. Dr. David Doermann, speaking about Deepfakes to the U.S. House Intelligence Committee candidly admitted that we could throw absolutely everything that we have at this, at these types of techniques, and there would still be some question about whether it is authentic or not.
Eventually, the courts will have to fall back on the old system of eye-witnesses for whom, as we do know from our experience of the past 162 years in Indian criminal courts, telling lies under oath has been extremely common.
What the Future Holds
As the technology develops, it is only a matter of time before Deepfake videos will be so good that, even for the specialists, proving the existence of fakery in a ‘legally tenable’ manner would become no less than the work of writing a PhD thesis in computer science. Since no court can afford that much time or resources, the evidentiary value of videos will eventually become zero because it will be claimed that the video being shown by the prosecution or defence that implicates or exonerates somebody is a fake one. It will be extremely difficult, almost impossible for the courts to disprove the contention. Courts will have no option but to reject videos as evidence because, as Andre Assumpcao, data scientist at the National Center for State Courts (NCSC), USA says, they cannot run the risk of basing a conviction or acquittal on evidence of dubious value. Fred Lederer, chancellor professor of law and director of the Center for Legal and Court Technology at William and Mary Law School says that from a court’s perspective, it means that you cannot believe what you see. ‘Seeing is believing’ is slated to become history.
[…] Also Read: Deepfake videos will prove to be a nightmare for the courts […]