Photographie ©Stellabrettiana

Welcome to our website. We are the special interest group on Audio at the Laboratoire des Sciences du Numérique de Nantes (France), or Audio@LS2N for short.

Bienvenue sur notre site. Nous sommes le groupe de travail sur l’audio du Laboratoire des sciences du numérique de Nantes (Audio@LS2N).

UPCOMING EVENTS

Création “in earth we walk” @ Halle 6

Han Han 25-04-2025

A live performance for voice, live electronics, and double bass. Created by Han Han

In earth we walk is a fleeting moment where voices become agents for constructing nature-inspired landscapes: voices utter semantically charged words conveying vivid scenarios; voices supply raw sonic material that are treated as pure sounds. The libretto is a six-stanza poem that unfolds a series of pictorial and psychological scenes, exploring themes of longing, awe and the reckoning with impermanence. Together, vocal emulations of clouds, torrent, winds, tides and sands weave into a sonic experience that evokes one’s multifaceted relationship with the many wonders and situations earth puts one in.

Read More
Bridge

“Sensing the City Using Sound Sources: Outcomes of the CENSE Project” @ Urban Sound Symposium

Mathieu Lagrange 28-04-2025
Read More
Bienvenue sur notre site. Nous sommes le groupe de travail sur l’audio du Laboratoire des sciences du numérique de Nantes (Audio@LS2N).

OTHER NEWS

Le streaming comme infrastructure et comme mode de vie @ RNRM

Vincent Lostanlen 26-03-2025
Publications

L’enquête sur l’impact écologique du streaming musical révèle deux angles d’analyse : l’un fondé sur l’infrastructure matérielle, l’autre sur l’évolution des modes de vie. À l’heure où les architectures de choix sont de plus en plus verrouillées autour d’un petit nombre de géants du numérique, l’enjeu de cette enquête réside dans une complémentarité entre méthodes quantitatives et méthodes qualitatives, ainsi que dans une interdisciplinarité entre sciences du numérique, sciences humaines et sociales et sciences du système Terre. Dans ce contexte, critiquer l’insoutenabilité du streaming ne signifie pas s’en remettre à une innovation technologique qui pourrait soudain « verdir » la filière dans son ensemble. Bien plutôt, il s’agit de dénoncer et contester l’utopie d’une musique intégralement disponible, pour tout le monde, partout, tout de suite. Pour se rendre crédibles, les scénarios alternatifs au statu quo doivent définir, dans un même geste technocritique, quel mode de vie ils promeuvent et quelle infrastructure ils maintiendront.

Read More

Robust Multicomponent Tracking of Ultrasonic Vocalizations @ IEEE ICASSP

Vincent Lostanlen 25-03-2025
Publications

Ultrasonic vocalizations (USV) convey information about individual identity and arousal status in mice. We propose to track USV as ridges in the time-frequency domain via a variant of timefrequency reassignment (TFR). The key idea is to perform TFR with empirical Wiener shrinkage and multitapering to improve robustness to noise. Furthermore, we perform TFR over both the short-term Fourier transform and the constant-Q transform so as to detect both the fundamental frequency and its harmonic partial (if any). Experimental results show that our approach effectively estimates multicomponent ridges with high precision and low frequency deviation.

Read More
Yuexuan Kong ICASSP 2025

S-KEY: Self-Supervised Learning of Major and Minor Keys from Audio @ IEEE ICASSP

Vincent Lostanlen 19-03-2025
Publications

STONE, the current method in self-supervised learning for tonality estimation in music signals, cannot distinguish relative keys, such as C major versus A minor. In this article, we extend the neural network architecture and learning objective of STONE to perform self-supervised learning of major and minor keys (S-KEY). Our main contribution is an auxiliary pretext task to STONE, formulated using transposition-invariant chroma features as a source of pseudo-labels. S-KEY matches the supervised state of the art in tonality estimation on FMAKv2 and GTZAN datasets while requiring no human annotation and having the same parameter budget as STONE. We build upon this result and expand the training set of S-KEY to a million songs, thus showing the potential of large-scale self-supervised learning in music information retrieval.

Read More
Clara Boukhemia

Introducing: Clara Boukhemia

Vincent Lostanlen 04-02-2025
General

Clara is working on augmented reality approaches to improve sound comfort in indoor environments, specifically in shared workspaces. She is a PhD student, supervised by Nicolas Misdariis from the Ircam in Paris and Mathieu Lagrange from the SIMS team at LS2N. 

Read More

Towards better visualizations of urban sound environments: insights from interviews @ Inter-Noise

Modan Tailleur 20-01-2025
Publications

Urban noise maps and noise visualizations traditionally provide macroscopic representations of noise levels across cities. However, those representations fail at accurately gauging the sound perception associated with these sound environments, as perception highly depends on the sound sources involved. This paper aims at analyzing the need for the representations of sound sources, by identifying the urban stakeholders for whom such representations are assumed to be of importance. Through spoken interviews with various urban stakeholders, we have gained insight into current practices, the strengths and weaknesses of existing tools and the relevance of incorporating sound sources into existing urban sound environment representations. Three distinct use of sound source representations emerged in this study: 1) noise-related complaints for industrials and specialized citizens, 2) soundscape quality assessment for citizens, and 3) guidance for urban planners. Findings also reveal diverse perspectives for the use of visualizations, which should use indicators adapted to the target audience, and enable data accessibility.

Read More
Spectrograms

Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation @ NeurIPS Audio Imagination workshop

Modan Tailleur 20-01-2025
Publications

Despite significant advancements in neural text-to-audio generation, challenges persist in controllability and evaluation. This paper addresses these issues through the Sound Scene Synthesis challenge held as part of the Detection and Classification of Acoustic Scenes and Events 2024. We present an evaluation protocol combining objective metric, namely Fréchet Audio Distance, with perceptual assessments, utilizing a structured prompt format to enable diverse captions and effective evaluation. Our analysis reveals varying performance across sound categories and model architectures, with larger models generally excelling but innovative lightweight approaches also showing promise. The strong correlation between objective metrics and human ratings validates our evaluation approach. We discuss outcomes in terms of audio quality, controllability, and architectural considerations for text-to-audio synthesizers, providing direction for future research.

Read More

Podcast “Musique et IA” sur France musique

Vincent Lostanlen 15-01-2025
Media

L’intelligence artificielle est partout, y compris dans le domaine musical. Qu’il s’agisse de “générer” des musiques à partir de données existantes, de créer une musique 100 % originale ou de réaliser certaines tâches pratiques, les usages sont nombreux et les inquiétudes aussi.

Read More

Model-based deep learning for music information research @ IEEE Signal Processing Magazine

Vincent Lostanlen 13-01-2025
Publications

We refer to the term model-based deep learning for approaches that combine traditional knowledge-based methods with data-driven techniques, especially those based on deep learning, within a differentiable computing framework. In music, prior knowledge for instance related to sound production, music perception or music composition theory can be incorporated into the design of neural networks and associated loss functions. We outline three specific scenarios to illustrate the application of model-based deep learning in MIR, demonstrating the implementation of such concepts and their potential.

Read More

BirdVox in MIT Technology Review

Vincent Lostanlen 10-01-2025
General

Vincent speaks to MIT Technology Review on the past, present, and future of machine learning for bird migration monitoring.

Read More
Reyhaneh Abbasi

Introducing: Reyhaneh Abbasi

Vincent Lostanlen 20-11-2024
General

Reyhaneh is working on the generation of mouse ultrasonic vocalizations, with applications to animal behavior research.

Read More

Introducing: Matthieu Carreau

Vincent Lostanlen 08-11-2024
People

Matthieu is working on audio signal processing algorithms that can run on autonomous sensors subject to intermitent power supply. He is a PhD student, advised by Vincent Lostanlen, Pierre-Emmanuel Hladik, and Sébastien Faucou.

Read More

STONE: Self-supervised tonality estimator @ ISMIR

Yuexuan Kong 11-10-2024
Publications

Although deep neural networks can estimate the key of a musical piece, their supervision incurs a massive annotation effort. Against this shortcoming, we present STONE, the first self-supervised tonality estimator. The architecture behind STONE, named ChromaNet, is a convnet with octave equivalence which outputs a “key signature profile” (KSP) of 12 structured logits. First, we train ChromaNet to regress artificial pitch transpositions between any two unlabeled musical excerpts from the same audio track, as measured as cross-power spectral density (CPSD) within the circle of fifths (CoF). We observe that this self-supervised pretext task leads KSP to correlate with tonal key signature. Based on this observation, we extend STONE to output a structured KSP of 24 logits, and introduce supervision so as to disambiguate major versus minor keys sharing the same key signature. Applying different amounts of supervision yields semi-supervised and fully supervised tonality estimators: i.e., Semi-TONEs and Sup-TONEs. We evaluate these estimators on FMAK, a new dataset of 5489 real-world musical recordings with expert annotation of 24 major and minor keys. We find that Semi-TONE matches the classification accuracy of Sup-TONE with reduced supervision and outperforms it with equal supervision.

Read More

Detection of Deepfake Environmental Audio @ EUSIPCO

Mathieu Lagrange 01-10-2024
Publications

With the ever-rising quality of deep generative models,
it is increasingly important to be able to discern whether the
audio data at hand have been recorded or synthesized. Although
the detection of fake speech signals has been studied extensively,
this is not the case for the detection of fake environmental audio.
We propose a simple and efficient pipeline for detecting fake
environmental sounds based on the CLAP audio embedding. We
evaluate this detector using audio data from the 2023 DCASE
challenge task on Foley sound synthesis.
Our experiments show that fake sounds generated by 44 stateof-
the-art synthesizers can be detected on average with 98% accuracy.
We show that using an audio embedding trained specifically
on environmental audio is beneficial over a standard VGGish
one as it provides a 10% increase in detection performance. The
sounds misclassified by the detector were tested in an experiment
on human listeners who showed modest accuracy with nonfake
sounds, suggesting there may be unexploited audible features.

Read More

Sound source classification for soundscape analysis using fast third-octave bands data from an urban acoustic sensor network @ JASA

Mathieu Lagrange 01-10-2024
Publications

The exploration of the soundscape relies strongly on the characterization of the sound sources in the sound environment. Novel sound source classifiers, called pre-trained audio neural networks (PANNs), are capable of predicting the presence of more than 500 diverse sound sources. Nevertheless, PANNs models use fine Mel spectro-temporal representations as input, whereas sensors of an urban noise monitoring network often record fast third-octaves data, which have significantly lower spectro-temporal resolution. In a previous study, we developed a transcoder to transform fast third-octaves into the fine Mel spectro-temporal representation used as input of PANNs. In this paper, we demonstrate that employing PANNs with fast third-octaves data, processed through this transcoder, does not strongly degrade the classifier’s performance in predicting the perceived time of presence of sound sources. Through a qualitative analysis of a large-scale fast third-octave dataset, we also illustrate the potential of this tool in opening new perspectives and applications for monitoring the soundscapes of cities.

Read More

EMVD dataset: a dataset of extreme vocal distortion techniques used in heavy metal @ CBMI

Mathieu Lagrange 01-10-2024
Publications

In this paper, we introduce the Extreme Metal Vocals Dataset, which comprises a collection of recordings of extreme vocal techniques performed within the realm of heavy metal music. The dataset consists of 760 audio excerpts of 1 second to 30 seconds long, totaling about 100 min of audio material, roughly composed of 60 minutes of distorted voices and 40 minutes of clear voice recordings. These vocal recordings are from 27 different singers and are provided without accompanying musical instruments or post-processing effects. The distortion taxonomy within this dataset encompasses four distinct distortion techniques and three vocal effects, all performed in different pitch ranges. Performance of a state-of-the-art deep learning model is evaluated for two different classification tasks related to vocal techniques, demonstrating the potential of this resource for the audio processing community.

Read More

1 2 3 5 Next › Last »