Detection of Deepfake Environmental Audio @ EUSIPCO

With the ever-rising quality of deep generative models, it is increasingly important to be able to discern whether the audio data at hand have been recorded or synthesized. Although the detection of fake speech signals has been studied extensively, this is not the case for the detection of fake environmental audio. We propose a simple and efficient pipeline for detecting fake environmental sounds based on the CLAP audio embedding. We evaluate this detector using audio data from the 2023 DCASE challenge task on Foley sound synthesis.

Our experiments show that fake sounds generated by 44 state-of-the-art synthesizers can be detected on average with 98\% accuracy. We show that using an audio embedding trained specifically on environmental audio is beneficial over a standard VGGish one as it provides a 10% increase in detection performance. The sounds misclassified by the detector were tested in an experiment on human listeners who showed modest accuracy with nonfake sounds, suggesting there may be unexploited audible features.

Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependent @ EUSIPCO

This paper explores whether considering alternative domain-specific embeddings to calculate the Fréchet Audio Distance (FAD) metric can help the FAD to correlate better with perceptual ratings of environmental sounds. We used embeddings from VGGish, PANNs, MS-CLAP, L-CLAP, and MERT, which are tailored for either music or environmental sound evaluation. The FAD scores were calculated for sounds from the DCASE 2023 Task 7 dataset. Using perceptual data from the same task, we find that PANNs-WGM-LogMel produces the best correlation between FAD scores and perceptual ratings of both audio quality and perceived fit with a Spearman correlation higher than 0.5. We also find that music-specific embeddings resulted in significantly lower results. Interestingly, VGGish, the embedding used for the original Fréchet calculation, yielded a correlation below 0.1. These results underscore the critical importance of the choice of embedding for the FAD metric design.

Foley sound synthesis at the DCASE 2023 challenge

The addition of Foley sound effects during post-production is a common technique used to enhance the perceived acoustic properties of multimedia content. Traditionally, Foley sound has been produced by human Foley artists, which involves manual recording and mixing of sound. However, recent advances in sound synthesis and generative models have generated interest in machine-assisted or… Continue reading Foley sound synthesis at the DCASE 2023 challenge

ReNAR: Reducing Noise with Augmented Reality

Noise pollution has a significant impact on quality of life. In the office, noise exposure creates stress that leads to reduced performance, provokes annoyance responses and changes in social behaviour. Headphones with excellent noise-cancelling processors can now be acquired in order to protect oneself from the noise exposure. While these techniques have reached a high… Continue reading ReNAR: Reducing Noise with Augmented Reality

Spectral trancoder: using pretrained urban sound classifiers on undersampled spectral representations @ DCASE

Slow or fast third-octave bands representations (with a frame resp. every 1-s and 125-ms) have been a de facto standard for urban acoustics, used for example in long-term monitoring applications. It has the advantages of requiring few storage capabilities and of preserving privacy. As most audio classification algorithms take Mel spectral representations with very fast… Continue reading Spectral trancoder: using pretrained urban sound classifiers on undersampled spectral representations @ DCASE

Perceptual musical similarity metric learning with graph neural networks @ IEEE WASPAA

Sound retrieval for assisted music composition depends on evaluating similarity between musical instrument sounds, which is partly influenced by playing techniques. Previous methods utilizing Euclidean nearest neighbours over acoustic features show some limitations in retrieving sounds sharing equivalent timbral properties, but potentially generated using a different instrument, playing technique, pitch or dynamic. In this paper,… Continue reading Perceptual musical similarity metric learning with graph neural networks @ IEEE WASPAA

An interactive bi-objective optimisation process to guide the design of electric vehicle warning sounds @ Design Science

Abstract Electric vehicles (EVs) are very quiet at low speed, which can be hazardous for pedestrians, especially visually impaired people. It is now mandatory (since mid-2019 in Europe) to add external warning sounds, but poor sound design can lead to noise pollution, and consequently annoyance. Moreover, it is possible that EVs are not sufficiently detectable… Continue reading An interactive bi-objective optimisation process to guide the design of electric vehicle warning sounds @ Design Science