Assessing machine learning techniques for HRTF individualisation
Spatial audio relies on head-related transfer functions (HRTFs) to simulate how sound interacts with the human body in order to create realistic audio experiences. Since HRTFs are unique to each person, using non-personalised HRTFs can lead to audio experiences feeling poorer and less immersive.
Machine learning-approaches have shown that they can be more efficient and accurate than traditional methods when it comes to HRTF personalisation. However, these methods require careful training and validation to achieve good and unbiased performances. Despite this, there is as yet no standard approach to evaluating these methods.
New SONICOM research surveys machine learning-based approaches for HRTF personalisation in the literature, organising them according to the processing steps involved in the machine learning workflow.
In addition to categorising the works of the existing literature, this survey discusses their achievements, identifies their limitations, and outlines aspects that require further investigation at the crossroads of research communities in acoustics, audio signal processing, and machine learning.
“Previous surveys have covered various aspects of HRTF personalization, but none have focused specifically on machine learning.” says Dr Davide Fantini, lead author of the paper based at the University of Milan. “We hope that this survey will provide other researchers with a clear overview of the state of the art and encourage future standardization actions in the field of HRTF individualisation based on machine learning.”
Results
The analysis revealed the prevalent approaches for each step of the machine learning workflow, which include:
- CIPIC – one of the earliest HRTF datasets – as the dataset;
- Anthropometry as input;
- HRTF magnitude as output;
- Possibly principal component analysis (PCA) for HRTF preprocessing;
- Multivariate linear regression (MLR) or neural networks as machine learning model, and;
- Spectral distortion for evaluation.
Subsequently, the research discussed the main gaps existing in the literature which could comprise topics of future studies, i.e.
- The limitations of anthropometry-based methods;
- The reported performances, which are still inferior to the individual HRTFs;
- The scarce applications of recent machine learning developments, including explainable AI;
- The lack of a standardized evaluation protocol, and;
- The infrequent investigation of perceptual metrics, especially in the context of ecologically-valid experimental settings, which encompass multiple aspects of spatial audio beyond HRTF individualization.
Read the full publication: doi.org/10.1109/OJSP.2025.3528330