On the Robustness of Musical Timbre Perception Models: From Perceptual to Learned Approaches

Communications dans un congrès

Auteurs : Barbara Pascal, Mathieu Lagrange.

Conférence : 32th European Signal Processing Conference (EUSIPCO)

Date de publication : 2024

Audio timbre perceptionDistance metric learningTime/frequency analysisScattering transformSpectrotemporal modulationsDeep neural networksDeep embeddingsRobustness analysis

Lien vers le dépot HAL

Abstract

Timbre, encompassing an intricate set of acoustic cues, is key to identify sound sources, and especially to discriminate musical instruments and playing styles. Psychoacoustic studies focusing on timbre deploy massive efforts to explain human timbre perception. To uncover the acoustic substrates of timbre perceived dissimilarity, a recent work leveraged metric learning strategies on different perceptual representations and performed a meta-analysis of seventeen dissimilarity rated musical audio datasets. By learning salient patterns in very high-dimensional representations, metric learning accounts for a reasonably large part of the variance in human ratings. The present work shows that combining the most recent deep audio embeddings with a metric learning approach makes it possible to explain almost all the variance in human dissimilarity ratings. Furthermore, the robustness of the learning procedure against simulated human rating variability is thoroughly investigated. Intensive numerical experiments support the explanatory power and robustness against degraded dissimilarity ratings of the learning metric strategy using deep embeddings.