This repository supports our research paper titled "Brain-controlled augmented hearing for spatially moving conversations in noisy environments". The main components of this repository are:
- Binaural Speech Separation Algorithm: Separates the speech streams of the moving talkers while preserving their location.
- Auditory Attention Decoding (AAD): Decodes to which talker the listener is attending to by analyzing their brain signals.
🚨 Notice: The research paper has not yet been made public. This code is currently intended for paper review purposes only.
This section provides data, code for training separation models, pre-trained models, and a demo for inference.
- Ensure you have installed all the dependencies listed in the
requirements.txt
file. - This codebase is tested on Python version 3.9.16.
- The Google Resonance Audio Software Development Kit was employed to spatialize the audio. For more details about spatializing sounds through HRTFs, adding reverb, and modeling shoebox environments, please refer to these scripts.
- We provide pre-generated moving speaker audio. You can download them without the need to generate them by yourself.
- Download DEMAND dataset for acoustic noise in diverse environments.
We train both a separation model, a post-enhancement model, and a localization model separately.
- After downloading the pre-generated moving speaker audio and noise audio, set up the dataset:
python create_separation_dataset.py
- Train the separation model:
python train_separation_model.py --training-file-path 'your_path' --validation-file-path 'your_path' --checkpoint-path 'your_path'
- After training the separation model, please use it to separate speakers and create the dataset for training the enhancement model.
- Kick off the enhancement model training:
python train_enhancement_model.py --training-file-path 'your_path' --validation-file-path 'your_path' --checkpoint-path 'your_path'
- The localization model is used to predict the locations (moving trajectory) of the separated speaker.
- After training the enhancement model, please use it to get enhanced separated speech and create the dataset for training the localization model.
- Train the localization model training:
python train_localization_model.py --training-file-path 'your_path' --validation-file-path 'your_path' --checkpoint-path 'your_path'
This section contains resources and code for conducting AAD and relevant analyses.
- The script Step_15_Spec_SS_g_PCA_CCA_FINAL.m is used to train CCA models that learn the mapping between the neural responses and the attended stimuli.
- The script Step_15_Spec_SS_WinByWin_PCA_CCA_FINAL.m is used to evaluate the CCA models for various window sizes on a window-by-window basis and also generate correlations of the brain waves with the attended and unattended stimuli.
We use the CCA implementation from the NoiseTools package developed by Dr. Alain de Cheveigné:
de Cheveigné, A., Wong, DDE., Di Liberto, GM, Hjortkjaer, J., Slaney M., Lalor, E. (2018) Decoding the auditory brain with canonical correlation analysis. NeuroImage 172, 206-216, https://doi.org/10.1016/j.neuroimage.2018.01.033.