We will present a tutorial on Kymatio at the International Society for Music Information Retrieval (ISMIR) Conference, held in Milan on November 5-9, 2023.
Kymatio: Deep Learning meets Wavelet Theory for Music Signal Processing
Kymatio is a Python package for applications at the intersection of deep learning and wavelet scattering. Its v0.4 release provides an implementation of the joint time–frequency scattering transform (JTFS), which is an idealisation of a neurophysiological model that is commonly known in musical timbre perception research: the spectrotemporal receptive field (STRF) (Patil et al., 2012).
In MIR research, scattering transforms have demonstrated effectiveness in musical instrument classification (Vahidi et al., 2022), neural audio synthesis (Andreux et al., 2018), playing technique recognition and similarity (Lostanlen et al., 2021), acoustic modelling (Lostanlen et al., 2020), synthesizer parameter estimation and objective audio similarity (Vahidi et al., 2023, Lostanlen et al., 2023).
We will present some applications of Kymatio to music science:
- Visualizations of music signals with wavelet transform and scattering transforms
- Music classification and segmentation with scattering transforms
- Scattering for generative evaluation of audio representations (GEAR) (Lostanlen et al., 2023)
We ask our participants to have some prior knowledge in Python and NumPy programming, spectrogram visualization, and computer generation of sounds. Familiarity with PyTorch is a bonus, but not essential. We will assume no prior knowledge of wavelet theory.
Tutorial organizers
Cyrus Vahidi is a PhD researcher at the UKRI CDT in Artificial Intelligence and Music at the Centre for Digital Music, London and computer science graduate from Imperial College London. His research covers computational representations of auditory perception in machine listening and computer music. He is a core contributor to Kymatio, the open-source package for wavelet scattering. Previously, he was a visiting researcher at LS2N (CNRS, France) and worked on MIR/ML in ByteDance’s SAMI group. He is the founder of Sonophase AI and performs experimental electronic music with Max/MSP and modular synthesis.
Christopher Mitcheltree is a PhD researcher at the UKRI CDT in Artificial Intelligence and Music at the Centre for Digital Music, London. He researches time-varying modulations of synthesizers / audio effects and is a founding developer of Neutone, an open-source neural audio plugin and SDK. In the past, he has worked on machine learning and art projects at a variety of different companies and institutions including: Google, Airbnb, AI2, Keio University, and Qosmo.
Dr. Vincent Lostanlen obtained his PhD in 2017 from École normale supérieure, under the supervision of Stéphane Mallat. Since then, he is a scientist (chargé de recherche) at CNRS and a visiting scholar at New York University. He is a founding member of the Kymatio consortium.
References
Andreux, M., Angles, T., Exarchakisgeo, G., Leonardu, R., Rochette, G., Thiry, L., … & Eickenberg, M. (2020). Kymatio: Scattering transforms in python. The Journal of Machine Learning Research, 21(1), 2256-2261.
Andreux, M., & Mallat, S. (2018, September). Music Generation and Transformation with Moment Matching-Scattering Inverse Networks. In ISMIR (pp. 327-333).
Lostanlen, V., El-Hajj, C., Rossignol, M., Lafay, G., Andén, J., & Lagrange, M. (2021). Time–frequency scattering accurately models auditory similarities between instrumental playing techniques. EURASIP Journal on Audio, Speech, and Music Processing, 2021(1), 1-21.
Lostanlen, V., Cohen-Hadria, A., & Bello, J. P. (2020). One or two components? the scattering transform answers. arXiv preprint arXiv:2003.01037.
Lostanlen, V., Yan, L., & Yang, X. (2023). From HEAR to GEAR: Generative Evaluation of Audio Representations. Proceedings of Machine Learning Research, (166), 48-64.
Muradeli, J., Vahidi, C., Wang, C., Han, H., Lostanlen, V., Lagrange, M., & Fazekas, G. (2022, September). Differentiable Time-Frequency Scattering On GPU. In Digital Audio Effects Conference (DAFx).
Vahidi, C., Han, H., Wang, C., Lagrange, M., Fazekas, G., & Lostanlen, V. (2023). Mesostructures: Beyond spectrogram loss in differentiable time-frequency analysis. arXiv preprint arXiv:2301.10183.