Within the deep learning paradigm, finite impulse response (FIR) filters are often used to encode audio signals, yielding flexible and adaptive feature representations. We show that a stabilization of FIR filterbanks with fixed filter lengths (convolutional layers with 1-D filters)leads to encoders that are optimally robust against noise and can be inverted with perfect reconstruction by their transposes. To maintain their flexibility as regular neural network layers, we implement the stabilization via a computationally efficient regularizing term in the objective function of the learning problem. In this way, the encoder keeps its expressive power and is optimally stable and noise-robust throughout the whole learning procedure. We show in a denoising task where noise is present in the input and in the encoder representation, that the proposed stabilization of the trainable filterbank encoder is decisive for increasing the signal-to-noise ratio of the denoised signals significantly compared to a model with a naively trained encoder.
Author: Vincent Lostanlen
International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR)
The 6th edition of the VIHAR workshop will be held in Kos, Greece, as a satellite event of INTERSPEECH. Vincent will chair one of the sessions and present a short paper named Towards Differentiable Motor Control of Bird Vocalizations. Official website: http://vihar-2024.vihar.org
Journées du réseau “Capteurs en environnement”
Le comité d’organisation de ces journées est heureux de vous accueillir du 13 au 15 novembre 2024 à Fréjus, dans le Var, au centre de vacances de la Villa Clythia.
Ces journées, réservées en priorité aux membres du réseau, seront l’occasion de découvrir le travail des membres du réseau, de se rencontrer et d’échanger sur les nombreuses thématiques abordées par le réseau, depuis le développement de capteurs jusqu’à l’acquisition et le traitement des données, en passant par les contraintes de déploiement in natura.
Culture et patrimoine numérique @ Nantes Digital Week
de 9h30 à 7h au Château des ducs de Bretagne à Nantes.
11h15. Vincent LOSTANLEN. “Le streaming, est-ce que ça pollue ?” : pour une écologie de la musique numérique.
Table ronde “IA en recherche” @ Ifremer
Cet événement, organisé par les doctorants de l’Ifremer, a déjà rassemblé une centaine de participants chaque année autour de présentations orales, de tables rondes et d’ateliers de prise en main. Cette journée est ouverte à toute personne intéressée par l’IA dans la recherche, des applications, les limites et les aspects éthiques.
Rendez-vous le mardi 8 octobre 2024 dans le centre Atlantique de l’Ifremer à Nantes.
BirdVoxDetect: Large-Scale Detection and Classification of Flight Calls for Bird Migration Monitoring @ IEEE TASLP
Sound event classification has the potential to advance our understanding of bird migration. Although it is long known that migratory species have a vocal signature of their own, previous work on automatic flight call classification has been limited in robustness and scope: e.g., covering few recording sites, short acquisition segments, and simplified biological taxonomies. In this paper, we present BirdVoxDetect (BVD), the first full-fledged solution to bird migration monitoring from acoustic sensor network data.
Loïc demonstrates his FTM Synthesizer on Erae Touch
A demonstration by independent researcher Loïc Jankowiak, who visited LS2N in 2023 and 2024 as part of the ApEx project of Han Han.
This is an in-progress build of a physically-modelled drum synthesizer using the Functional Transformation Method (FTM), with external MIDI control. The Embodme Erae Touch control surface is used for the velocity and impact position on the virtual drum’s surface, and the Behringer X-Touch Mini is used for controlling the various parameters of the physical model, i.e. the material properties.
Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds @ EUSIPCO
Multi-label imbalanced classification poses a significant challenge in machine learning, particularly evident in bioacoustics where animal sounds often co-occur, and certain sounds are much less frequent than others. This paper focuses on the specific case of classifying anuran species sounds using the dataset AnuraSet, that contains both class imbalance and multi-label examples. To address these challenges, we introduce Mixture of Mixups (Mix2), a framework that leverages mixing regularization methods Mixup, Manifold Mixup, and MultiMix. Experimental results show that these methods, individually, may lead to suboptimal results; however, when applied randomly, with one selected at each training iteration, they prove effective in addressing the mentioned challenges, particularly for rare classes with few occurrences. Further analysis reveals that the model trained using Mix2 is also proficient in classifying sounds across various levels of class co-occurrences.
Phantasmagoria: Sound Synthesis After the Turing Test @ S4
Sound synthesis with computers is often described as a Turing test or “imitation game”. In this context, a passing test is regarded by some as evidence of machine intelligence and by others as damage to human musicianship. Yet, both sides agree to judge synthesizers on a perceptual scale from fake to real. My article rejects this premise and borrows from philosopher Clément Rosset’s “L’Objet singulier” (1979) and “Fantasmagories” (2006) to affirm (1) the reality of all music, (2) the infidelity of all audio data, and (3) the impossibility of strictly repeating sensations. Compared to analog tape manipulation, deep generative models are neither more nor less unfaithful. In both cases, what is at stake is not to deny reality via illusion but to cultivate imagination as “function of the unreal” (Bachelard); i.e., a precise aesthetic grip on reality. Meanwhile, i insist that digital music machines are real objects within real human societies: their performance on imitation games should not exonerate us from studying their social and ecological impacts.
Machine listening symposium at World Ecoacoustics Congress
The 10th edition of the World Ecoacoustics Congress was held in Madrid between July 8th and July 12th. In this context, Juan Sebastián Ulloa and myslef have co-organized a special 3-hour symposium titled “Machine listening meets passive acoustic monitoring”. This event is supported by CAPTEO and PETREL projects.