Language Interaction

Conversational AI to enhance/augment machine and human capabilities

Speech Technology group

Design conversional AI to enhance/augment user capabilites

Infrastructure
maintenance

Online
Education

Training &
Support

Remote
meetings

Collaborative
Robots

Health
Care

The Speech Technology Group (STG) is at the heart of modern artificial intelligence by designing novel algorithms for automatic speech recognition and data-driven dialogue systems enabling the creation of advanced and natural, speech enabled, human-machine interfaces. STG has been established in 2002 and since then worked on a wide range of speech technologies that include text-to-speech synthesis, speech intelligibility, automatic speech recognition (ASR) and dialogue modelling. Our focus is to develop advanced natural spoken human-machine interfaces and develop products and services that facilitate easy access to information, thereby improving productivity and quality of human life. STG has made significant contributions to the next generation of Toshiba’s speech recognition, HMM-based speech synthesis and statistical dialogue modelling. We work in collaboration with the speech R&D groups at the Knowledge Media Lab, Toshiba RDC, Kawasaki, Japan and Toshiba China R&D Centre, Beijing, China, and business divisions of Toshiba Group, Japan. Working with groups within Toshiba, we have a tight coupling between our R&D efforts and current and future product development. STG has a long history to work and collaborate with academia, and also constantly strives to forge new relations. We fund research and have academic collaborations with groups in various UK and European Union Universities and Research Centres. Combining the strengths of our group with these collaborations, we address various research topics related to Speech Technology for the future.

Automatic Speech Recognition

Automatic transcription of speech to text plays a critical role in the human-machine interaction. Background noise, reverberation, competing speakers and natural speech variability across speakers make the task challenging. Toshiba aims to improve the state-of-the-art in automatic speech recognition by combining signal processing and machine learning approaches. Our research focuses on both front-end (signal enhancement) and back-end (acoustic modelling for end-to-end streaming ASR, adaptation of end-to-end models).

Dialogue Modelling

The Vision & Learning Group (VLG) focuses on learning from interaction in physical environments. Complex and safe manipulation and navigation technology leverage precise 3D geometry and scene understanding in conjunction with strong world-aware action selection frameworks. Learned concepts are effectively transfered to new domains.