Detection of Deepfake Environmental Audio

Communications dans un congrès

Auteurs : Hafsa Ouajdi, Oussama Hadder, Modan Tailleur, Mathieu Lagrange, Laurie M Heller.

Conférence : EUSIPCO

Date de publication : 2024

Fake detectionEnvironmental soundDeep learningClassificationDeepfake audio
Lien vers le dépot HAL

Abstract


With the ever-rising quality of deep generative models, it is increasingly important to be able to discern whether the audio data at hand have been recorded or synthesized. Although the detection of fake speech signals has been studied extensively, this is not the case for the detection of fake environmental audio. We propose a simple and efficient pipeline for detecting fake environmental sounds based on the CLAP audio embedding. We evaluate this detector using audio data from the 2023 DCASE challenge task on Foley sound synthesis. Our experiments show that fake sounds generated by 44 stateof- the-art synthesizers can be detected on average with 98% accuracy. We show that using an audio embedding trained specifically on environmental audio is beneficial over a standard VGGish one as it provides a 10% increase in detection performance. The sounds misclassified by the detector were tested in an experiment on human listeners who showed modest accuracy with nonfake sounds, suggesting there may be unexploited audible features.