Can machines learn filterbank design?

LOGO ÖAW

A talk by Vincent Lostanlen at the Acoustics Research Institute (ARI) of the Austrian Academy of Sciences (ÖAW) in Vienna.

Filterbank analysis is an essential component of machine listening as a pre-processing step before pattern recognition in the time-frequency domain. In speech and music signal processing, filterbank design is often accomplished from prior knowledge about auditory perception and the tuning of musical instruments. Yet, this kind of prior knowledge is not available in emerging domains of machine listening, such as bioacoustics, urban acoustics, industrial acoustics, and medical acoustics. In this context, one solution is to replace filterbank design with a data-driven procedure involving training a neural network on the “raw waveform.” In this talk, I will outline an ongoing research program toward making this training procedure more stable, sample-efficient, and parameter-efficient. The key idea is to train separate convolutional operators over the subbands of a non-learned filterbank: typically, a discrete wavelet transform (DWT). This kind of “hybrid” approach, combining digital signal processing and machine learning, can be justified formally via simple techniques in linear algebra and probability theory. I will present some insightful numerical simulations and a real-world application to speech enhancement, conducted in collaboration with some members of the OeAW and the University of Vienna: Daniel Haider, Felix Perfler, Martin Ehler, and Peter Balazs.

More information: https://www.oeaw.ac.at/mla2s/events-1/mla2s-presents-vincent-lostanlen-can-machines-learn-filterbank-design