In its most basic sense, a deepfake is a combination of face- and voice-cloning AI technologies that allow for the creation of life-like, computer-generated videos of a real person.
In order to develop a high-quality deepfake of an individual, developers need to accumulate tens of hours of video footage associated with the person whose face/voice is to be cloned, as well as a human imitator who has learned the facial mannerisms and voice of the target.
There are two humans involved in the creation of a deepfake, such that the target face/voice is that of the famous person while the other belongs to an unknown individual who is generally closely associated with the project.
From tech to reality
From a technical standpoint, visual deepfakes are devised through the use of machine learning tools that are able to decode and strip down the images of all the facial expressions related to the two individuals into a matrix consisting of certain key attributes, such as the position of the target’s nose, eyes and mouth. Additionally, finer details, such as skin texture and facial hair, are given less importance and can be thought of as secondary.
The deconstruction, in general, is performed in such a way that it is mostly always possible to fully recreate the original image of each face from its stripped elements. Additionally, one of the primary aspects of creating a quality deepfake is how well the final image is reconstructed — such that any movements in the face of the imitator are realized in the target’s face as well.
To elaborate on the matter, Matthew Dixon, an assistant professor and researcher at the Illinois Institute of Technology’s Stuart School of Business, told Cointelegraph that both face and voice can be easily reconstructed through certain programs and techniques, adding that:
“Once a person has been digitally cloned it is possible to then generate fake video footage of them saying anything, including speaking words of malicious propaganda on…