Explainable Audio Classification of Playing Techniques with Layer-wise Relevance Propagation

Communications dans un congrès

Auteurs : Changhong Wang, Vincent Lostanlen, Mathieu Lagrange.

Conférence : 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Date de publication : 2023

Layer-wise relevance propagationScattering transformPlaying technique recognitionMusic signal analysis

Lien vers le dépot HAL

Abstract

Deep convolutional networks (convnets) in the time-frequency domain can learn an accurate and fine-grained categorization of sounds. For example, in the context of music signal analysis, this categorization may correspond to a taxonomy of playing techniques: vibrato, tremolo, trill, and so forth. However, convnets lack an explicit connection with the neurophysiological underpinnings of musical timbre perception. In this article, we propose a data-driven approach to explain audio classification in terms of physical attributes in sound production. We borrow from current literature in "explainable AI" (XAI) to study the predictions of a convnet which achieves an almost perfect score on a challenging task: i.e., the classification of five comparable real-world playing techniques from 30 instruments spanning seven octaves. Mapping the signal into the carrier-modulation domain using scattering transform, we decompose the networks' predictions over this domain with layer-wise relevance propagation. We find that regions highly-relevant to the predictions localized around the physical attributes with which the playing techniques are performed.