in English – Audio @ LS2N

HybrA @ ICA

“Hybrid auditory filterbanks (HybrA): Learnable, interpretable, and stable filterbanks for feature extraction”. A talk by Peter Balazs (Austrian Academy of Sciences) at International Congress on Acoustics in New Orleans. As part of the MuReNN project.

Unrolling and self-supervised learning for inverse problems

Inverse problems, where hidden variables are reconstructed from indirect measurements, often rely on iterative optimization methods that become computationally expensive as data size grows. This thematic day will focus on the emerging paradigm of algorithm unrolling, as a tool for designing state-of-the-art deep neural network architectures. By unrolling the iterations of traditional optimization algorithms, we can learn their parameters as if they were neural network weights, allowing for faster, more efficient solutions that exploit the forward model. More generally, the program will cover the interest of deep (un/self/*/supervised) learning for solving inverse problems.

Introducing: Ainė Drėlingytė

Ainė is pursuing a PhD focused on developing speech masking strategies, including phone-level and frequency-level techniques.

FAVN @ Palazzina Appiani

Florian Hecker’s computer music piece ‘FAVN’, premiered at Alte Oper Frankfurt in 2016, will make a return on June 28–29th, 2025, at Palazzina Appiani in Milan, Italy.

Postdoc offer: “Deep learning and multiresolution analysis for audio”

We are looking to recruit a postdoc as part of the ANR project on multi-resolution neural networks (MuReNN). The goal is to work towards more efficient and interpretable models for deep learning in audio.

Residual Hybrid Filterbanks @ IEEE SSP

A hybrid filterbanks is a convolutional neural network (convnet) whose learnable filters operate over the subbands of a non-learnable filterbank, which is designed from domain knowledge. While hybrid filterbanks have found successful applications in speech enhancement, our paper shows that they remain susceptible to large deviations of the energy response due to randomness of convnet weights at initialization. Against this issue, we propose a variant of hybrid filterbanks, by inspiration from residual neural networks (ResNets). The key idea is to introduce a shortcut connection at the output of each non-learnable filter, bypassing the convnet. We prove that the shortcut connection in a residual hybrid filterbank lowers the relative standard deviation of the energy response while the pairwise cosine distances between non-learnable filters contributes to preventing duplicate features.

Robust Deconvolution with Parseval Filterbanks @ IEEE SampTA

This article introduces two contributions: Multiband Robust Deconvolution (Multi-RDCP), a regularization approach for deconvolution in the presence of noise; and Subband-Normalized Adaptive Kernel Evaluation (SNAKE), a first-order iterative algorithm designed to efficiently solve the resulting optimization problem. Multi-RDCP resembles Group LASSO in that it promotes sparsity across the subband spectrum of the solution. We prove that SNAKE enjoys fast convergence rates and numerical simulations illustrate the efficiency of SNAKE for deconvolving noisy oscillatory signals.

Human Auditory Ecology @ MNHN

MNHN

Can we hear “ecological processes” underlying natural habitats and ecosystems (i.e., the processes responsible for the dynamics and functions of ecological systems at multiple spatial and temporal scales) ? If so, how do we hear such ecological processes ?

Robust Multicomponent Tracking of Ultrasonic Vocalizations @ IEEE ICASSP

Screenshot

Ultrasonic vocalizations (USV) convey information about individual identity and arousal status in mice. We propose to track USV as ridges in the time-frequency domain via a variant of timefrequency reassignment (TFR). The key idea is to perform TFR with empirical Wiener shrinkage and multitapering to improve robustness to noise. Furthermore, we perform TFR over both the short-term Fourier transform and the constant-Q transform so as to detect both the fundamental frequency and its harmonic partial (if any). Experimental results show that our approach effectively estimates multicomponent ridges with high precision and low frequency deviation.

Invited talk: Constance Douwes

In this seminar, we propose a shift towards energy-aware model evaluation. Using a Pareto-optimal framework, we advocate for balancing performance with energy efficiency through an extended analysis of deep generative models for speech synthesis. Furthermore, we refine energy consumption measurements by studying elementary neural network architectures, highlighting complex relationships between energy consumption, the number of operations, and hardware dependencies. Finally, as organizers of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge, we analyze the impact of introducing an energy criterion on the challenge results and explore the evolution of system complexity and energy consumption over the years.