Invited talk: Constance Douwes

Constance Douwes

at École Centrale de Nantes, amphi E, 2pm.

Towards Sustainable Evaluation of Deep Neural Networks for Audio

Deep learning models are now a key component of modern audio systems, with their use increasing significantly over the years. Their flexibility and generalization capabilities make them powerful tools in various contexts, including speech synthesis, music generation, and machine listening. However, these advantages come at the cost of expensive training sessions that require large amounts of data and energy-intensive dedicated hardware, leading to significant greenhouse gas emissions. The metrics we use as a scientific community to evaluate our work are at the heart of this problem. Currently, deep learning research prioritizes performance and accuracy, often overshadowing the computational cost of these models. In this seminar, we propose a shift towards energy-aware model evaluation. Using a Pareto-optimal framework, we advocate for balancing performance with energy efficiency through an extended analysis of deep generative models for speech synthesis. Furthermore, we refine energy consumption measurements by studying elementary neural network architectures, highlighting complex relationships between energy consumption, the number of operations, and hardware dependencies. Finally, as organizers of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge, we analyze the impact of introducing an energy criterion on the challenge results and explore the evolution of system complexity and energy consumption over the years.