Can we hear “ecological processes” underlying natural habitats and ecosystems (i.e., the processes responsible for the dynamics and functions of ecological systems at multiple spatial and temporal scales) ? If so, how do we hear such ecological processes ?
Author: Vincent Lostanlen
Le streaming comme infrastructure et comme mode de vie @ RNRM
L’enquête sur l’impact écologique du streaming musical révèle deux angles d’analyse : l’un fondé sur l’infrastructure matérielle, l’autre sur l’évolution des modes de vie. À l’heure où les architectures de choix sont de plus en plus verrouillées autour d’un petit nombre de géants du numérique, l’enjeu de cette enquête réside dans une complémentarité entre méthodes quantitatives et méthodes qualitatives, ainsi que dans une interdisciplinarité entre sciences du numérique, sciences humaines et sociales et sciences du système Terre. Dans ce contexte, critiquer l’insoutenabilité du streaming ne signifie pas s’en remettre à une innovation technologique qui pourrait soudain « verdir » la filière dans son ensemble. Bien plutôt, il s’agit de dénoncer et contester l’utopie d’une musique intégralement disponible, pour tout le monde, partout, tout de suite. Pour se rendre crédibles, les scénarios alternatifs au statu quo doivent définir, dans un même geste technocritique, quel mode de vie ils promeuvent et quelle infrastructure ils maintiendront.
Robust Multicomponent Tracking of Ultrasonic Vocalizations @ IEEE ICASSP
Ultrasonic vocalizations (USV) convey information about individual identity and arousal status in mice. We propose to track USV as ridges in the time-frequency domain via a variant of timefrequency reassignment (TFR). The key idea is to perform TFR with empirical Wiener shrinkage and multitapering to improve robustness to noise. Furthermore, we perform TFR over both the short-term Fourier transform and the constant-Q transform so as to detect both the fundamental frequency and its harmonic partial (if any). Experimental results show that our approach effectively estimates multicomponent ridges with high precision and low frequency deviation.
Invited talk: Constance Douwes
In this seminar, we propose a shift towards energy-aware model evaluation. Using a Pareto-optimal framework, we advocate for balancing performance with energy efficiency through an extended analysis of deep generative models for speech synthesis. Furthermore, we refine energy consumption measurements by studying elementary neural network architectures, highlighting complex relationships between energy consumption, the number of operations, and hardware dependencies. Finally, as organizers of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge, we analyze the impact of introducing an energy criterion on the challenge results and explore the evolution of system complexity and energy consumption over the years.
S-KEY: Self-Supervised Learning of Major and Minor Keys from Audio @ IEEE ICASSP
STONE, the current method in self-supervised learning for tonality estimation in music signals, cannot distinguish relative keys, such as C major versus A minor. In this article, we extend the neural network architecture and learning objective of STONE to perform self-supervised learning of major and minor keys (S-KEY). Our main contribution is an auxiliary pretext task to STONE, formulated using transposition-invariant chroma features as a source of pseudo-labels. S-KEY matches the supervised state of the art in tonality estimation on FMAKv2 and GTZAN datasets while requiring no human annotation and having the same parameter budget as STONE. We build upon this result and expand the training set of S-KEY to a million songs, thus showing the potential of large-scale self-supervised learning in music information retrieval.
Création “in earth we walk” @ Halle 6
A live performance for voice, live electronics, and double bass. Created by Han Han
In earth we walk is a fleeting moment where voices become agents for constructing nature-inspired landscapes: voices utter semantically charged words conveying vivid scenarios; voices supply raw sonic material that are treated as pure sounds. The libretto is a six-stanza poem that unfolds a series of pictorial and psychological scenes, exploring themes of longing, awe and the reckoning with impermanence. Together, vocal emulations of clouds, torrent, winds, tides and sands weave into a sonic experience that evokes one’s multifaceted relationship with the many wonders and situations earth puts one in.
Introducing: Clara Boukhemia
Clara is working on augmented reality approaches to improve sound comfort in indoor environments, specifically in shared workspaces. She is a PhD student, supervised by Nicolas Misdariis from the Ircam in Paris and Mathieu Lagrange from the SIMS team at LS2N.
Green Days @ IRISA
Les “Green Days” sont les journées francophones sur le numérique écoresponsable, organisées avec un ensemble de GDRs et PEPR.
Podcast “Musique et IA” sur France musique
L’intelligence artificielle est partout, y compris dans le domaine musical. Qu’il s’agisse de “générer” des musiques à partir de données existantes, de créer une musique 100 % originale ou de réaliser certaines tâches pratiques, les usages sont nombreux et les inquiétudes aussi.
Model-based deep learning for music information research @ IEEE Signal Processing Magazine
We refer to the term model-based deep learning for approaches that combine traditional knowledge-based methods with data-driven techniques, especially those based on deep learning, within a differentiable computing framework. In music, prior knowledge for instance related to sound production, music perception or music composition theory can be incorporated into the design of neural networks and associated loss functions. We outline three specific scenarios to illustrate the application of model-based deep learning in MIR, demonstrating the implementation of such concepts and their potential.