Toshiba Cambridge Research Laboratory
Cambridge Research Laboratory > Speech Technology

Speech Technology Group

Toshiba is developing new services and products that use speech technology and it is part of Toshiba effort to contribute to a sustainable society by creating new value with reliable technologies.

The advancements in automatic speech recognition, text-to-speech synthesis and intelligent dialogue systems, create a unique opportunity to create the next-generation of human-machine interfaces, revolutionizing the modern Artificial Intelligence, developing speech interfaces for facilitating a deep communication between humans and machines for improving productivity and quality of life of humans.

Toshiba are at the forefront of this wave of modern Artificial Intelligence. With more that 50 years of technology innovations in voice, video, language and knowledge, Toshiba has recently created an artificial intelligence platform for business to business applications and services and it is part of the Toshiba Internet of Things architecture, SPINEX. We welcome enquiries from potential users of this system or from companies or organisations interested in a commercial partnership to further develop the technology and its applications.

Toshiba Research Europe Limited (TREL) is part of Toshiba’s global R&D activity. The Cambridge Research Laboratory of TREL conducts research on speech recognition and dialogue as well as quantum information technology and computer vision.

The Speech Technology Group (STG), led by Prof. Yannis Stylianou, is actively researching the application of speech technologies in advanced human-machine interfaces for modern Artificial Intelligence based applications. Therefore, our interest lies not only in exploring new concepts, but also in their practical realization.

Recent Publications

2017 2016 2015
2017
Domain Complexity and Policy Learning in Task-oriented Dialogue Systems
A. Papangelis, S. Ultes and Y. Stylianou
Proc. IWSDS 2017, Summit Inn Farmington, Pennsylvania, USA, June 2017
Single-model Multi-domain Dialogue Management with Deep Learning
A. Papangelis and Y. Stylianou
Proc. IWSDS 2017, Summit Inn Farmington, Pennsylvania, USA, June 2017
Adaptive Gain Control and Time Warp for Enhanced Speech Intelligibility under Reverberation
P. Petkov and Y. Stylianou
Proc. ICASSP 2017, New Orleans, USA, March 2017
Effective Emotion Recognition in Movie Audio Tracks
M. Kotti and Y. Stylianou
Proc. ICASSP 2017, New Orleans, USA, March 2017
Expressive Visual Text-To-Speech and Expression Adaptation using Deep Neural Networks
J. Parker, R. Maia, Y. Stylianou and R. Cipolla
Proc. ICASSP 2017, New Orleans, USA, March 2017
Predicting Dialogue Success, Naturalness, and Length with Acoustic Features
A. Papangelis, M. Kotti and Y. Stylianou
Proc. ICASSP 2017, New Orleans, USA, March 2017
Evaluation of Near-End Speech Enhancement under Equal-Loudness Constraint for Listeners with Normal-Hearing and Mild-to-Moderate Hearing Loss
T. C. Zorila, Y. Stylianou, S. Flanagan and B. C. J. Moore
Journal of the Acoustical Society of America vol 141 no 1, January 2017
2017 2016 2015
2016
Adaptive Gain Control for Enhanced Speech Intelligibility Under Reverberation
P. Petkov and Y. Stylianou
IEEE Signal Processing Letters vol 23 no 10, October 2016
Near and Far Field Speech-in-Noise Intelligibility Improvements Based on a Time–Frequency Energy Reallocation Approach
T. C. Zorila, Y. Stylianou, T. Ishihara and M. Akamine
IEEE Trans. Audio, Speech and Language Processing vol 24 no 10, October 2016
Automated Pause Insertion for Improved Intelligibility Under Reverberation
P. Petkov, N. Braunschweiler and Y. Stylianou
Proc. Interspeech 2016, San Francisco, USA, September 2016
Effectiveness of Near-End Speech Enhancement Under Equal-Loudness and Equal-Level Constraints
T. C. Zorila, S. Flanagan, B. C. J. Moore and Y. Stylianou
Proc. Interspeech 2016, San Francisco, USA, September 2016
Enhancing the Intelligibility of Speech in Noise for Children Diagnosed with Auditory Processing Disorder
T. C. Zorila, S. Flanagan, B. C. J. Moore and Y. Stylianou
Proc. Basic Auditory Science 2016, Cambridge, UK, September 2016
Generalizing Steady State Suppression for Enhanced Intelligibility Under Reverberation
P. Petkov and Y. Stylianou
Proc. Interspeech 2016, San Francisco, USA, September 2016
Multi-domain Spoken Dialogue Systems using Domain-Independent Parameterisation
A. Papangelis and Y. Stylianou
Proc. DADA 2016, Riva del Garda, Italy, September 2016
Pause Prediction from Text for Speech Synthesis with User-Definable Pause Insertion Likelihood Threshold
N. Braunschweiler and R. Maia
Proc. Interspeech 2016, San Francisco, USA, September 2016
Global Variance in Speech Synthesis With Linear Dynamical Models
V. Tsiaras, R. Maia, V. Diakoloukas, Y. Stylianou and V. Digalakis
IEEE Signal Processing Letters vol 23 no 8, August 2016
Effectiveness of a Loudness Model for Time-Varying Sounds in Equating the Loudness of Sentences Subjected to Different Forms of Signal Processing
T. C. Zorila, Y. Stylianou, S. Flanagan and B. C. J. Moore
Journal of the Acoustical Society of America vol 140 no 1, July 2016 free
Expressive Visual Text-To-Speech as an Assistive Technology for Individuals with Autism Spectrum Conditions
S. A. Cassidy, B. Stenger, L. Van Dongen, K. Yanagisawa, R. Anderson, V. Wan, S. BaronCohen and R. Cipolla
Computer Vision and Image Understanding, Special Issue on Assistive Computer Vision and Robotics vol 148, July 2016 free
Initial Investigation of Speech Synthesis Based on Complex-Valued Neural Networks
Q. Hu, K. Richmond, J. Yamagishi, K. Subramanian and Y. Stylianou
Proc. ICASSP 2016, Shanghai, China, March 2016
Iterative Estimation of Phase using Complex Cepstrum Representation
R. Maia and Y. Stylianou
Proc. ICASSP 2016, Shanghai, China, March 2016
Multi-Stream Spectral Representation for Statistical Parametric Speech Synthesis
K. Yanagisawa, R. Maia and Y. Stylianou
Proc. ICASSP 2016, Shanghai, China, March 2016
Speaker Adaptive Training in Deep Neural Networks using Speaker Dependent Bottle Neck Features
R. Doddipatla
Proc. ICASSP 2016, Shanghai, China, March 2016
Voice Activity Detection: Merging Source and Filter-based Information
T. Drugman, Y. Stylianou, Y. Kida and M. Akamine
IEEE Signal Processing Letters vol 23 no 2, February 2016
2017 2016 2015
2015
A Fast Algorithm for Improved Intelligibility of Speech-in-Noise Based on Frequency and Time Domain Energy Reallocation
T. C. Zorila and Y. Stylianou
Proc. Interspeech 2015, Dresden, Germany, September 2015
A Maximum Likelihood Approach to Detect Moments of Maximum Excitation and its Application to High-Quality Speech Parameterization
R. Maia, Y. Stylianou and M. Akamine
Proc. Interspeech 2015, Dresden, Germany, September 2015
Fast and Accurate Phase Unwrapping
T. Drugman and Y. Stylianou
Proc. Interspeech 2015, Dresden, Germany, September 2015
Fusion of Multiple Parametrization for DNN-Based Sinusoidal Speech Synthesis with Multi-Task Learning
Q. Hu, Z. Wu, K. Richmond, J. Yamagishi, Y. Stylianou and R. Maia
Proc. Interspeech 2015, Dresden, Germany, September 2015
Intelligibility Enhancement of Casual Speech for Reverberant Environments Inspired by Clear Speech Properties
M. Koutsogiannaki, P. Petkov and Y. Stylianou
Proc. Interspeech 2015, Dresden, Germany, September 2015
Learning Domain-Independent Dialogue Policies via Ontology Parameterisation
Z. Wang and Y. Stylianou
Proc. SIGDIAL 2015, Prague, Czech Republic, September 2015
Towards a Linear Dynamical Model Based Speech Synthesizer
V. Tsiaras, R. Maia, V. Diakoloukas, Y. Stylianou and V. Digalakis
Proc. Interspeech 2015, Dresden, Germany, September 2015
Improved Face-to-Face Communication Using Noise Reduction and Speech Intelligibility Enhancement
A. Griffin, T. C. Zorila and Y. Stylianou
Proc. ICASSP 2015, Brisbane, Australia, April 2015
Methods for Applying Dynamic Sinusoidal Models to Statistical Parametric Speech Synthesis
Q. Hu, Y. Stylianou, R. Maia, K. Richmond and J. Yamagishi
Proc. ICASSP 2015, Brisbane, Australia, April 2015
Robust Excitation-based Features for Automatic Speech Recognition
T. Drugman, Y. Stylianou, L. Chen, X. Chen and M. Gales
Proc. ICASSP 2015, Brisbane, Australia, April 2015
Speaker and Expression Factorization for Audiobook Data: Expressiveness and Transplantation
L. Chen, N. Braunschweiler and M. Gales
IEEE Trans. Audio, Speech and Language Processing vol 23, April 2015
To Top