Photographie ©Stellabrettiana

Welcome to our website. We are the special interest group on Audio at the Laboratoire des Sciences du Numérique de Nantes (France), or Audio@LS2N for short.

Bienvenue sur notre site. Nous sommes le groupe de travail sur l’audio du Laboratoire des sciences du numérique de Nantes (Audio@LS2N).

UPCOMING EVENTS

Table ronde “IA en recherche” @ Ifremer

Vincent Lostanlen 08-10-2024

Cet événement, organisé par les doctorants de l’Ifremer, a déjà rassemblé une centaine de participants chaque année autour de présentations orales, de tables rondes et d’ateliers de prise en main. Cette journée est ouverte à toute personne intéressée par l’IA dans la recherche, des applications, les limites et les aspects éthiques.
Rendez-vous le mardi 8 octobre 2024 dans le centre Atlantique de l’Ifremer à Nantes.

Read More

Journée GdR IASIS “Synthèse audio” à l’Ircam

Vincent Lostanlen 07-11-2024

As part of the CNRS special interest group on signal and image processing (GdR IASIS), we are organizing a 1-day workshop on audio synthesis at Ircam on November 7th, 2024.

Dans le cadre de l’axe « Audio, Vision, Perception » du GdR IASIS, nous organisons une journée d’études dédiée à la synthèse audio. La journée se tiendra le jeudi 7 novembre 2024 à l’Ircam (laboratoire STMS), à Paris.

Read More

Journées du réseau “Capteurs en environnement”

Vincent Lostanlen 13-11-2024

Le comité d’organisation de ces journées est heureux de vous accueillir du 13 au 15 novembre 2024 à Fréjus, dans le Var, au centre de vacances de la Villa Clythia.

Ces journées, réservées en priorité aux membres du réseau, seront l’occasion de découvrir le travail des membres du réseau, de se rencontrer et d’échanger sur les nombreuses thématiques abordées par le réseau, depuis le développement de capteurs jusqu’à l’acquisition et le traitement des données, en passant par les contraintes de déploiement in natura.

Read More
Bienvenue sur notre site. Nous sommes le groupe de travail sur l’audio du Laboratoire des sciences du numérique de Nantes (Audio@LS2N).

OTHER NEWS

Detection of Deepfake Environmental Audio @ EUSIPCO

Mathieu Lagrange 01-10-2024
Publications
Read More

Sound source classification for soundscape analysis using fast third-octave bands data from an urban acoustic sensor network @ JASA

Mathieu Lagrange 01-10-2024
Publications
Read More

EMVD dataset: a dataset of extreme vocal distortion techniques used in heavy metal @ CBMI

Mathieu Lagrange 01-10-2024
Publications
Read More

Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependent @ EUSIPCO

Mathieu Lagrange 01-10-2024
Publications
Read More

On the Robustness of Musical Timbre Perception Models: From Perceptual to Learned Approaches @ EUSIPCO

Mathieu Lagrange 01-10-2024
Publications
Read More
Haider 2024 Interspeech

Hold Me Tight: Stable Encoder–Decoder Design for Speech Enhancement

Vincent Lostanlen 24-09-2024
Publications

Convolutional layers with 1-D filters are often used as frontend to encode audio signals. Unlike fixed time-frequency representations, they can adapt to the local characteristics of input data. However, 1-D filters on raw audio are hard to train and often suffer from instabilities. In this paper, we address these problems with hybrid solutions, i.e., combining theory-driven and datadriven approaches. First, we preprocess the audio signals via a auditory filterbank, guaranteeing good frequency localization for the learned encoder. Second, we use results from frame theory to define an unsupervised learning objective that encourages energy conservation and perfect reconstruction. Third, we adapt mixed compressed spectral norms as learning objectives to the encoder coefficients. Using these solutions in a low-complexity encoder-mask-decoder model significantly improves the perceptual evaluation of speech quality (PESQ) in speech enhancement.

Read More

Machine Listening in a Neonatal Intensive Care Unit @ DCASE

Vincent Lostanlen 19-09-2024
Publications

Oxygenators, alarm devices, and footsteps are some of the most common sound sources in a hospital. Detecting them has scientific value for environmental psychology but comes with challenges of its own: namely, privacy preservation and limited labeled data. In this paper, we address these two challenges via a combination of edge computing and cloud computing. For privacy preservation, we have designed an acoustic sensor which computes third-octave spectrograms on the fly instead of recording audio waveforms. For sample-efficient machine learning, we have repurposed a pretrained audio neural network (PANN) via spectral transcoding and label space adaptation. A small-scale study in a neonatological intensive care unit (NICU) confirms that the time series of detected events align with another modality of measurement: i.e., electronic badges for parents and healthcare professionals. Hence, this paper demonstrates the feasibility of polyphonic machine listening in a hospital ward while guaranteeing privacy by design.

Read More

Content-Based Indexing for Audio and Music: From Analysis to Synthesis

Mathieu Lagrange 18-09-2024
Events and General

Audio has long been a key component of multimedia research. As far as indexing is concerned, the research and industrial context has changed drastically in the last 20 years or so. Today, applications of audio indexing range from karaoke applications to singing voice synthesis and creative audio design. This special session aims at bringing together researchers that aim at proposing new tools or paradigms to investigate audio and music processing in the context of indexation and corpus-based generation.

Read More

Trainable signal encoders that are robust against noise @ Inter-Noise

Vincent Lostanlen 17-09-2024
Publications

Within the deep learning paradigm, finite impulse response (FIR) filters are often used to encode audio signals, yielding flexible and adaptive feature representations. We show that a stabilization of FIR filterbanks with fixed filter lengths (convolutional layers with 1-D filters)leads to encoders that are optimally robust against noise and can be inverted with perfect reconstruction by their transposes. To maintain their flexibility as regular neural network layers, we implement the stabilization via a computationally efficient regularizing term in the objective function of the learning problem. In this way, the encoder keeps its expressive power and is optimally stable and noise-robust throughout the whole learning procedure. We show in a denoising task where noise is present in the input and in the encoder representation, that the proposed stabilization of the trainable filterbank encoder is decisive for increasing the signal-to-noise ratio of the denoised signals significantly compared to a model with a naively trained encoder.

Read More

BirdVoxDetect: Large-Scale Detection and Classification of Flight Calls for Bird Migration Monitoring @ IEEE TASLP

Vincent Lostanlen 23-08-2024
Publications

Sound event classification has the potential to advance our understanding of bird migration. Although it is long known that migratory species have a vocal signature of their own, previous work on automatic flight call classification has been limited in robustness and scope: e.g., covering few recording sites, short acquisition segments, and simplified biological taxonomies. In this paper, we present BirdVoxDetect (BVD), the first full-fledged solution to bird migration monitoring from acoustic sensor network data.

Read More
demo FTM Synthesizer

Loïc demonstrates his FTM Synthesizer on Erae Touch

Vincent Lostanlen 24-07-2024
Media

A demonstration by independent researcher Loïc Jankowiak, who visited LS2N in 2023 and 2024 as part of the ApEx project of Han Han.

This is an in-progress build of a physically-modelled drum synthesizer using the Functional Transformation Method (FTM), with external MIDI control. The Embodme Erae Touch control surface is used for the velocity and impact position on the virtual drum’s surface, and the Behringer X-Touch Mini is used for controlling the various parameters of the physical model, i.e. the material properties.

Read More
AnuraSet

Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds @ EUSIPCO

Vincent Lostanlen 24-07-2024
Publications

Multi-label imbalanced classification poses a significant challenge in machine learning, particularly evident in bioacoustics where animal sounds often co-occur, and certain sounds are much less frequent than others. This paper focuses on the specific case of classifying anuran species sounds using the dataset AnuraSet, that contains both class imbalance and multi-label examples. To address these challenges, we introduce Mixture of Mixups (Mix2), a framework that leverages mixing regularization methods Mixup, Manifold Mixup, and MultiMix. Experimental results show that these methods, individually, may lead to suboptimal results; however, when applied randomly, with one selected at each training iteration, they prove effective in addressing the mentioned challenges, particularly for rare classes with few occurrences. Further analysis reveals that the model trained using Mix2 is also proficient in classifying sounds across various levels of class co-occurrences.

Read More

Phantasmagoria: Sound Synthesis After the Turing Test @ S4

Vincent Lostanlen 21-07-2024
Publications

Sound synthesis with computers is often described as a Turing test or “imitation game”. In this context, a passing test is regarded by some as evidence of machine intelligence and by others as damage to human musicianship. Yet, both sides agree to judge synthesizers on a perceptual scale from fake to real. My article rejects this premise and borrows from philosopher Clément Rosset’s “L’Objet singulier” (1979) and “Fantasmagories” (2006) to affirm (1) the reality of all music, (2) the infidelity of all audio data, and (3) the impossibility of strictly repeating sensations. Compared to analog tape manipulation, deep generative models are neither more nor less unfaithful. In both cases, what is at stake is not to deny reality via illusion but to cultivate imagination as “function of the unreal” (Bachelard); i.e., a precise aesthetic grip on reality. Meanwhile, i insist that digital music machines are real objects within real human societies: their performance on imitation games should not exonerate us from studying their social and ecological impacts.

Read More

WeAMEC PETREL project presented at Seanergy

Vincent Lostanlen 27-06-2024
Publications
Read More

Learning to Solve Inverse Problems for Perceptual Sound Matching @ IEEE TASLP

Vincent Lostanlen 20-06-2024
Publications

Perceptual sound matching (PSM) aims to find the input parameters to a synthesizer so as to best imitate an audio target. Deep learning for PSM optimizes a neural network to analyze and reconstruct prerecorded samples. In this context, our article addresses the problem of designing a suitable loss function when the training set is generated by a differentiable synthesizer. Our main contribution is perceptual–neural–physical loss (PNP), which aims at addressing a tradeoff between perceptual relevance and computational efficiency. The key idea behind PNP is to linearize the effect of synthesis parameters upon auditory features in the vicinity of each training sample. The linearization procedure is massively parallelizable, can be precomputed, and offers a 100-fold speedup during gradient descent compared to differentiable digital signal processing (DDSP). We show that PNP is able to accelerate DDSP with joint time–frequency scattering transform (JTFS) as auditory feature while preserving its perceptual fidelity. Additionally, we evaluate the impact of other design choices in PSM: parameter rescaling, pretraining, auditory representation, and gradient clipping. We report state-of-the-art results on both datasets and find that PNP-accelerated JTFS has greater influence on PSM performance than any other design choice.

Read More
1 2 3 4 Next › Last »