The speech recognition group focuses its efforts on the improvement of the state-of-the-art in automatic speech recognition by combining signal processing and machine learning approaches. Automatic transcription of speech-to-text is an ubiquitous technology, thought to make the human-machine interaction easier. Natural speech variability across speakers, background noise and numerous recording conditions make the task challenging.
Using advanced signal processing techniques combined with state-of-the-art acoustic modeling approaches, the ASR team aims to improve the performance of the system to be robust against speaker variability, noise and reverberation.
In collaboration with world-leading universities from the UK and Europe, we aim to push the boundaries to develop next generation ASR systems.
We are working on end-to-end systems combining signal processing with advanced deep learning architectures for processing single/multichannel signals, extracting robust to recording conditions features based on human auditory processing strategies and building acoustic models. We have suggested novel approaches based on deep learning for adaptation to noise and speaker variability.