ReCoRe: Regularized Contrastive Representation Learning of World Model

Rudra P.K. Poudel, Harit Pandya, Stephan Liwick, Roberto Cipolla
Cambridge Research Laboratory Toshiba Europe Ltd, UK

DiaLoc: An Iterative Approach to Embodied Dialog Localization

Chao Zhang, Mohan Li, Ignas Budvytis, Stephan Liwicki
Toshiba Europe Ltd

Cumulative Attention based streaming transformer ASR with internal language model joint training and rescoring

M. Li, C-T Do and R. Doddipatla
Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes Island, Greece, June 2023

Frame-wise and overlap-robust speaker embeddings for meeting diarization

T. Cord-Landwehr, C. Boeddeker, T-C ZorilÄƒ, R. Doddipatla and R. Haeb-Umbach
Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes Island, Greece, June 2023

On the effectiveness of monoaural target source extraction for distant end-to-end automatic speech recognition

T-C ZorilÄƒ and R. Doddipatla
Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes Island, Greece, June 2023

Non-Autoregressive End-to-End Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding

M Li and R. Doddipatla
Proc. IEEE Spoken Language Technology Workshop (SLT 2022), Doha, Qatar, January 2023

Multiple-hypothesis RNN-T loss for unsupervised fine-tuning and self-training of neural transducer

C-T Do, M. Li and R. Doddipatla
Proc. Interspeech 2022, Incheon, Korea, September 2022 / arXiv

Self-regularised minimum latency training for streaming transformer-based speech recognition

M. Li, R. Doddipatla and T-C ZorilÄƒ
Proc. Interspeech 2022, Incheon, Korea, September 2022

Combining structured and unstructured knowledge in an interactive search dialogue system

S. Stoyanchev, S. Pandey, S. Keizer, N. Braunschweiler and R. Doddipatla
Proc. 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2022), Edinburgh, UK, September 2022

On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training

J. Zhang, T-C Zorila, R. Doddipatla and J. Barker
Proc. Interspeech 2022, Incheon, Korea, September 2022

Comparing human emotion perception and automatic emotion recognition of user turns in human-machine dialogues

N. Braunschweiler, R. Doddipatla, S. Keizer, and S. Stoyanchev
Proc. UK Speech 2022, Edinburgh, UK, September 2022

Opening up minds with argumentative dialogues

Y. Farag, C. O. Brand, J. Amidei, P. Piwek, T. Stafford, S. Stoyanchev and A. Vlachos
Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, December 2022

Monoaural source separation: from anechoic to reverberant environments

T. Cord-Landwehr, C. Boeddeker, T. von Neumann, T-C ZorilÄƒ, R. Doddipatla and R. Haeb-Umbach
Proc. 2022 International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Germany

Transformer-based streaming ASR with cumulative attention

M. Li, S. Zhang, T.C. Zorila, and R. Doddipatla
Proc. IEEE ICASSP 2022, Singapore

Factors in Emotion Recognition with Deep Learning Models Using Speech and Text on Multiple Corpora

N. Braunschweiler, R. Doddipatla, S. Keizer, and S. Stoyanchev
IEEE Signal Processing Letters

Transformer-based streaming ASR with cumulative attention

M. Li, S. Zhang, T.C. Zorila, and R. Doddipatla
Proc. IEEE ICASSP 2022, Singapore

Speaker reinforcement using target source extraction for robust automatic speech recognition

T.C. Zorila and R. Doddipatla
Proc. IEEE ICASSP 2022, Singapore

Towards handling unconstrained user preferences in dialogue

S. Pandey, S. Stoyanchev, and R. Doddipatla
Proc. IWSDS 2021, Singapore / arXiv

QTMM2012c+: a queryable empirically-grounded resource of dialogue with argumentation

J. Amidei, P. Piwek, and S. Stoyanchev
Proc. 5th Workshop on Advances In Argumentation In Artificial Intelligence, Virtual Event

End-to-end neural based modification of noisy speech for speech-in-noise intelligibility improvement

M. Shifas, T.C. Zorila and Y. Stylianou
IEEE/ACM Transactions on Audio, Speech, and Language Processing

Dialogue strategy adaptation to new action sets using multi-dimensional modelling

S. Keizer, N. Braunschweiler, S. Stoyanchev, and R. Doddipatla
Proc. IEEE ASRU 2021, Cartagena, Colombia / arXiv

A study on cross-corpus speech emotion recognition and data augmentation

N. Braunschweiler, R. Doddipatla, S. Keizer, and S. Stoyanchev
Proc. IEEE ASRU 2021, Cartagena, Colombia / arXiv

Improving HS-DACS based streaming transformer ASR with deep reinforcement learning

M. Li and R. Doddipatla
Proc. IEEE ASRU 2021, Cartagena, Colombia

Teacher-student MixIT for unsupervised and semi-supervised speech separation

J. Zhang, T.C. Zorila, R. Doddipatla and J. Barker
Proc. INTERSPEECH 2021, Brno, Czechia / arXiv

Head-Synchronous Decoding for Transformer-based Streaming ASR

M. Li, T.C. Zorila and R. Doddipatla
Proc. IEEE ICASSP 2021, Toronto, Canada / arXiv

Transformer-based Online Speech Recognition with Decoder-End Adaptive Computation Steps

M. Li, T.C. Zorila and R. Doddipatla
Proc. IEEE SLT 2021, Shenzhen, China / arXiv

An Investigation into the Multi-Channel Time Domain Speaker Extraction Network

T.C. Zorila, M. Li and R. Doddipatla
Proc. IEEE SLT 2021, Shenzhen, China

Multiple-Hypothesis CTC-based Semi-Supervised Adaptation of End-to-End Speech Recognition

C.T Do, R. Doddipatla and T. Hain
Proc. IEEE ICASSP 2021, Toronto, Canada / arXiv

Action State Update Approach to Dialogue Management

S. Stoyanchev, S. Keizer and R. Doddipatla
Proc. IEEE ICASSP 2021, Toronto, Canada / arXiv

Train your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers

S. Zhang, C.T. Do, R. Doddipatla, E. Loweimi, P. Bell and S. Renals
Proc. IEEE ICASSP 2021, Toronto, Canada / arXiv

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

J. Zhang, T.C. Zorila, R. Doddipatla and J. Barker
Proc. IEEE ICASSP 2021, Toronto, Canada / arXiv

Selective Adaptation of End-to-End Speech Recognition using Hybrid CTC/Attention Architecture for Noise Robustness

C.-T. Do, S. Zhang and T. Hain
Proc. EUSIPCO 2020, Amsterdam, The Netherlands / arXiv

Open-domain Topic Identification of Out-of-domain Utterances using Wikipedia

A. Augustin, A. Papangelis, M. Kotti, P. Vougiouklis, J. Hare and N. Braunschweiler
In: Proc. of Human in the Loop Dialogue Systems Workshop of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

Toshiba’s Speech Recognition System for the CHiME 2020 Challenge

T.C. Zorila, M. Li, D. Hayakawa, M. Liu, N. Ding and R. Doddipatla
Proc. International Workshop on Speech Processing in Everyday Environments (CHiME 2020), Barcelona, Spain

Towards a speaker diarization system for the CHiME 2020 dinner party transcription

C. Boeddeker, T. Cord-Landwehr, J. Heitkaemper, T.C. Zorila, D. Hayakawa, M. Li, M. Liu, R. Doddipatla and R. Haeb-Umbach
Proc. International Workshop on Speech Processing in Everyday Environments (CHiME 2020), Barcelona, Spain

The ISO Standard for Dialogue Act Annotation, Second Edition

H. Bunt, V. Petukhova, E. Gilmartin, C. Pelachaud, A. Fang, S. Keizer and L. Prevot
Proc. Conference on Language Resources and Evaluation (LREC 2020), Marseille, France

Learning Noise Invariant Features through Transfer Learning for Robust End-to-End Speech Recognition

S. Zhang, C.-T. Do, R. Doddipatla and S. Renals
Proc. IEEE ICASSP 2020, Barcelona, Spain, May 2020

On End-to-End Multi-Channel Time Domain Speech Separation in Reverberant Environments

J. Zhang, T.C. Zorila, R. Doddipatla and J. Barker
Proc. IEEE ICASSP 2020, Barcelona, Spain, May 2020

Robust Belief State Space Representation for Statistical Dialogue Managers using Deep Autoencoders

F. Lygerakis, V. Diakoloulas, M. Lagoudakis and M. Kotti
Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU2019), Sentosa, Singapore, December 2019

An Investigation into the Effectiveness of Enhancement in ASR Training and Test for CHiME-5 Dinner Party Transcription

T.C. Zorila, C. Boeddeker, R. Doddipatla and R. Haeb-Umbach
Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU2019), Sentosa, Singapore, December 2019

Crowd-sourced Collection of Task-Oriented Human-Human Dialogues in a Multi-Domain Scenario

N. Braunschweiler, P. Papadakos, M. Kotti, Y. Marketakis and Y. Tzitzikas
Proc. International Conference on Text, Speech and Dialogue (TSD2019), Ljubljana, Slovenia, September 2019

Prediction of User Emotion and Dialogue Success Using Audio Spectrograms and Convolutional Neural Networks

A. Lykartsis and M. Kotti
Proc. SIGDIAL 2019, Stockholm, Sweden, September 2019

Subband Temporal Envelope Features and Data Augmentation for End-To-End Recognition of Distant Conversational Speech

C.T. Do
Proc. IEEE ICASSP 2019, Brighton, UK, May 2019

An Unsupervised Learning Approach to Neutral-Net-Supported WPE Dereverberation

P. Petkov, V. Tsiaras, R. Doddipatla and Y. Stylianou
Proc. IEEE ICASSP 2019, Brighton, UK, May 2019

On Reducing the Effect of Speaker Overlap for CHiME-5

T.C. Zorila and R. Doddipatla
Proc. IEEE ICASSP 2019, Brighton, UK, May 2019

Prediction of Dialogue Success with Spectral and Rhythm Acoustic Features using DNNs and SVMs

A. Lykartsis, M. Kotti, A. Papangelis and Y. Stylianou
Proc. IEEE Spoken Language Technology Workshop (SLT) 2018, Athens, Greece, December 2018

Comparison of an End-To-End Trainable Dialogue System with a Modular Statistical Dialogue System

N. Braunschweiler and A. Papangelis
Proc. Interspeech 2018, Hyderabad, India, September 2018

Weighting Time-Frequency Representation of Speech Using Auditory Saliency for Automatic Speech Recognition

C. T. Do and Y. Stylianou
Proc. Interspeech 2018, Hyderabad, India, September 2018

The Toshiba entry to the CHiME 2018 Challenge

R. Doddipatla, T. Kagoshima, C.-T. Do, P. Petkov, T. C. Zorila, U. Kim, D. Hayakawa, H. Fujimura and Y. Stylianou
Proc. Workshop on Speech Processing in Everyday Environments (CHiME 2018), Hyderabad, India, September 2018

A Case Study on the Importance of Belief State Representation for Dialogue Policy Management

M. Kotti, V. Diakoloukas, A. Papangelis, M. G. Lagoudakis and Y. Stylianou
Proc. Interspeech 2018, Hyderabad, India, September 2018

Domain Complexity and Policy Learning in Task-oriented Dialogue Systems

A. Papangelis, S. Ultes and Y. Stylianou
Lecture Notes in Electrical Engineering, August 2018

Single-model Multi-domain Dialogue Management with Deep Learning

A. Papangelis and Y. Stylianou
Lecture Notes in Electrical Engineering, August 2018

Spoken Dialogue for Information Navigation

A. Papangelis, P. Papadakos, Y. Stylianou and Y. Tzitzikas
Proc. SIGDIAL 2018, Melbourne, Australia, July 2018

On Finding the Relevant User Reviews for Advancing Conversational Faceted Search

E. Dimitrakis, K. Sgontzos, P. Papadakos, Y. Marketakis, A. Papangelis, Y. Stylianou and Y. Tzitzikas
Proc. ESW 2018, Heraklion, Greece, June 2018

Towards Scalable Information-Seeking Multi-Domain Dialogue

A. Papangelis, M. Kotti and Y. Stylianou
Proc. ICASSP 2018, Calgary, Alberta, Canada, April 2018

Information Navigation via Spoken Dialogue and Linked Data

A. Papangelis, P. Papadakos, N. Braunschweiler, Y. Stylianou, Y. Marketakis and Y. Tzitzikas
Proc. ICASSP 2018 (Demo), Calgary, Alberta, Canada, April 2018

Adaptation of an Expressive Single Speaker Deep Neural Network Speech Synthesis System

J. Parker, Y. Stylianou and R. Cipolla
Proc. ICASSP 2018, Calgary, Alberta, Canada, April 2018

Speech Processing to Improve the Perception of Speech in Background Noise for Children with Auditory Processing Disorder and Typically Developing Peers

S. Flanagan, T. C. Zorila, Y. Stylianou and B. C. J. Moore
Trends in Hearing vol 22, February 2018

Will This Dialogue be Unsuccessful? Prediction Using Audio Features and CNNs

M. Kotti, A. Papangelis and Y. Stylianou
Proc. SCAI Workshop 2017, Amsterdam, The Netherlands, October 2017

LD-SDS: Towards an Expressive Spoken Dialogue System based on Linked-Data

A. Papangelis, P. Papadakos, M. Kotti, Y. Stylianou, Y. Tzitzikas and D. Plexousakis
Proc. SCAI Workshop 2017, Amsterdam, The Netherlands, October 2017 / arXiv

Improved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-delay Neural Network Denoising Autoencoder

C. T. Do and Y. Stylianou
Proc. Interspeech 2017, Stockholm, Sweden, August 2017

Speaker Adaptation in DNN-based Speech Synthesis using D-vectors

R. Doddipatla, N. Braunschweiler and R. Maia
Proc. Interspeech 2017, Stockholm, Sweden, August 2017

On the Quality and Intelligibility of Noisy Speech Processed for Near-end Listening Enhancement

T. C. Zorila and Y. StylianouT. C. Zorila and Y. Stylianou
Proc. Interspeech 2017, Stockholm, Sweden, August 2017

Adaptive Gain Control and Time Warp for Enhanced Speech Intelligibility under Reverberation

P. Petkov and Y. Stylianou
Proc. ICASSP 2017, New Orleans, USA, March 2017

Electrically driven and electrically tunable quantum light sources

J. P. Lee, E. Murray, A. J. Bennett, D. J. P. Ellis, C. Dangel, I. Farrer, P. Spencer, D. A. Ritchie and A. J. Shields
Appl. Phys. Lett., vol 110, no 7, 071102, 13 February 2017 / arXiv

Evaluation of Near-End Speech Enhancement under Equal-Loudness Constraint for Listeners with Normal-Hearing and Mild-to-Moderate Hearing Loss

T. C. Zorila, Y. Stylianou, S. Flanagan and B. C. J. Moore
Journal of the Acoustical Society of America vol 141 no 1, January 2017

Adaptive Gain Control for Enhanced Speech Intelligibility Under Reverberation

P. Petkov and Y. Stylianou
IEEE Signal Processing Letters vol 23 no 10, October 2016

Near and Far Field Speech-in-Noise Intelligibility Improvements Based on a Time–Frequency Energy Reallocation Approach

T. C. Zorila, Y. Stylianou, T. Ishihara and M. Akamine
IEEE Trans. Audio, Speech and Language Processing vol 24 no 10, October 2016

Pause Prediction from Text for Speech Synthesis with User-Definable Pause Insertion Likelihood Threshold

N. Braunschweiler and R. Maia
Proc. Interspeech 2016, San Francisco, USA, September 2016

Multi-domain Spoken Dialogue Systems using Domain-Independent Parameterisation

A. Papangelis and Y. Stylianou
Proc. DADA 2016, Riva del Garda, Italy, September 2016

Generalizing Steady State Suppression for Enhanced Intelligibility Under Reverberation

P. Petkov and Y. Stylianou
Proc. Interspeech 2016, San Francisco, USA, September 2016

Automated Pause Insertion for Improved Intelligibility Under Reverberation

P. Petkov, N. Braunschweiler and Y. Stylianou
Proc. Interspeech 2016, San Francisco, USA, September 2016

Enhancing the Intelligibility of Speech in Noise for Children Diagnosed with Auditory Processing Disorder

T. C. Zorila, S. Flanagan, B. C. J. Moore and Y. Stylianou
Proc. Basic Auditory Science 2016, Cambridge, UK, September 2016

Effectiveness of Near-End Speech Enhancement Under Equal-Loudness and Equal-Level Constraints

T. C. Zorila, S. Flanagan, B. C. J. Moore and Y. Stylianou
Proc. Interspeech 2016, San Francisco, USA, September 2016

Global Variance in Speech Synthesis With Linear Dynamical Models

V. Tsiaras, R. Maia, V. Diakoloukas, Y. Stylianou and V. Digalakis
IEEE Signal Processing Letters vol 23 no 8, August 2016

Expressive Visual Text-To-Speech as an Assistive Technology for Individuals with Autism Spectrum Conditions

S. A. Cassidy, B. Stenger, L. Van Dongen, K. Yanagisawa, R. Anderson, V. Wan, S. BaronCohen and R. Cipolla
Computer Vision and Image Understanding, Special Issue on Assistive Computer Vision and Robotics vol 148, July 2016

Effectiveness of a Loudness Model for Time-Varying Sounds in Equating the Loudness of Sentences Subjected to Different Forms of Signal Processing

T. C. Zorila, Y. Stylianou, S. Flanagan and B. C. J. Moore
Journal of the Acoustical Society of America vol 140 no 1, July 2016

Speaker Adaptive Training in Deep Neural Networks using Speaker Dependent Bottleneck Features

R. Doddipatla
Proc. ICASSP 2016, Shanghai, China, March 2016

Initial Investigation of Speech Synthesis Based on Complex-Valued Neural Networks

Q. Hu, K. Richmond, J. Yamagishi, K. Subramanian and Y. Stylianou
Proc. ICASSP 2016, Shanghai, China, March 2016

Iterative Estimation of Phase using Complex Cepstrum Representation

R. Maia and Y. Stylianou
Proc. ICASSP 2016, Shanghai, China, March 2016

Multi-Stream Spectral Representation for Statistical Parametric Speech Synthesis

K. Yanagisawa, R. Maia and Y. Stylianou
Proc. ICASSP 2016, Shanghai, China, March 2016

Voice Activity Detection: Merging Source and Filter-based Information

T. Drugman, Y. Stylianou, Y. Kida and M. Akamine
IEEE Signal Processing Letters vol 23 no 2, February 2016

Fast and Accurate Phase Unwrapping

T. Drugman and Y. Stylianou
Proc. Interspeech 2015, Dresden, Germany, September 2015

Fusion of Multiple Parametrization for DNN-Based Sinusoidal Speech Synthesis with Multi-Task Learning

Q. Hu, Z. Wu, K. Richmond, J. Yamagishi, Y. Stylianou and R. Maia
Proc. Interspeech 2015, Dresden, Germany, September 2015

Intelligibility Enhancement of Casual Speech for Reverberant Environments Inspired by Clear Speech Properties

M. Koutsogiannaki, P. Petkov and Y. Stylianou
Proc. Interspeech 2015, Dresden, Germany, September 2015

A Maximum Likelihood Approach to Detect Moments of Maximum Excitation and its Application to High-Quality Speech Parameterization

R. Maia, Y. Stylianou and M. Akamine
Proc. Interspeech 2015, Dresden, Germany, September 2015

Towards a Linear Dynamical Model Based Speech Synthesizer

V. Tsiaras, R. Maia, V. Diakoloukas, Y. Stylianou and V. Digalakis
Proc. Interspeech 2015, Dresden, Germany, September 2015

Learning Domain-Independent Dialogue Policies via Ontology Parameterisation

Z. Wang and Y. Stylianou
Proc. SIGDIAL 2015, Prague, Czech Republic, September 2015

A Fast Algorithm for Improved Intelligibility of Speech-in-Noise Based on Frequency and Time Domain Energy Reallocation

T. C. Zorila and Y. Stylianou
Proc. Interspeech 2015, Dresden, Germany, September 2015

A Prototype for AUV Post-mission Debrief Generation from Metadata

Z. Wang and H. Hastie
Proc. AAMAS 2015, Istanbul, Turkey, May 2015

Speaker and Expression Factorization for Audiobook Data: Expressiveness and Transplantation

L. Chen, N. Braunschweiler and M. Gales
IEEE Trans. Audio, Speech and Language Processing vol 23, April 2015

Robust Excitation-based Features for Automatic Speech Recognition

T. Drugman, Y. Stylianou, L. Chen, X. Chen and M. Gales
Proc. ICASSP 2015, Brisbane, Australia, April 2015

Improved Face-to-Face Communication Using Noise Reduction and Speech Intelligibility Enhancement

A. Griffin, T. C. Zorila and Y. Stylianou
Proc. ICASSP 2015, Brisbane, Australia, April 2015

Methods for Applying Dynamic Sinusoidal Models to Statistical Parametric Speech Synthesis

Q. Hu, Y. Stylianou, R. Maia, K. Richmond and J. Yamagishi
Proc. ICASSP 2015, Brisbane, Australia, April 2015

Enhancing the Intelligibility of Statistically Generated Synthetic Speech by Means of Noise-Independent Modifications

D. Erro, T. C. Zorila and Y. Stylianou
IEEE Trans. Audio, Speech and Language Processing vol 22, December 2014

Fast Inter-Harmonic Reconstruction for Spectral Envelope Estimation in High-Pitched Voices

T. Drugman and Y. Stylianou
IEEE Signal Processing Letters vol 21 no 11, November 2014

Maximum Voiced Frequency Estimation: Exploiting Amplitude and Phase Spectra

T. Drugman and Y. Stylianou
IEEE Signal Processing Letters vol 21 no 10, October 2014

Enabling Controllability for Continuous Expression Space

L. Chen and N. Braunschweiler
Proc. Interspeech 2014, Singapore, September 2014

An Investigation of the Application of Dynamic Sinusoidal Models to Statistical Parametric Speech Synthesis

Q. Hu, Y. Stylianou, R. Maia, K. Richmond, J. Yamagishi and J. Latorre
Proc. Interspeech 2014, Singapore, September 2014

Analysis of Emotional Speech using an Adaptive Sinusoidal Model

G. Kafentzis, T. Yakoumaki, A. Mouchtaris and Y. Stylianou
Proc. EUSIPCO 2014, Lisbon, Portugal, September 2014

Generating Multiple-Accent Pronunciations for TTS using Joint Sequence Model Interpolation

B. Kolluru, V. Wan, J. Latorre, K. Yanagisawa and M. Gales
Proc. Interspeech 2014, Singapore, September 2014 a>

Speech Intonation for TTS: Study on Evaluation Methodology

J. Latorre, K. Yanagisawa, V. Wan, B. Kolluru and M. Gales
Proc. Interspeech 2014, Singapore, September 2014

Voice Expression Conversion with Factorised HMM-TTS Models

J. Latorre, V. Wan and K. Yanagisawa
Proc. Interspeech 2014, Singapore, September 2014

On the Impact of Excitation and Spectral Parameters for Expressive Statistical Parametric Speech Synthesis

R. Maia and M. Akamine
Computer Speech & Language vol 28 no 5, September 2014

Noise-robust TTS Speaker Adaptation with Statistics Smoothing

K. Yanagisawa, L. Chen and M. Gales
Proc. Interspeech 2014, Singapore, September 2014

On Spectral and Time Domain Energy Reallocation for Speech-in-noise Intelligibility Enhancement

T. C. Zorila and Y. Stylianou
Proc. Interspeech 2014, Singapore, September 2014

Speaker Dependent Expression Predictor From Text: Expressiveness and Transplantation

L. Chen, N. Braunschweiler and M. Gales
Proc. ICASSP 2014, Florence, Italy, May 2014

A Fixed Dimension and Perceptually Based Dynamic Sinusoidal Model of Speech

Q. Hu, Y. Stylianou, K. Richmond, R. Maia, J. Yamagishi and J. Latorre
Proc. ICASSP 2014, Florence, Italy, May 2014

Complex Cepstrum Factorization for Statistical Parametric Synthesis

R. Maia and Y. Stylianou
Proc. ICASSP 2014, Florence, Italy, May 2014

Real Time Speech-in-Noise Intelligibility Enhancement Based on Spectral Shaping and Dynamic Range Compression

V. Tsiaras, T. C. Zorila, Y. Stylianou and M. Akamine
Proc. ICASSP 2014 (Show & Tell), Florence, Italy, May 2014

Linear Dynamical Models in Speech Synthesis

V. Tsiaras, R. Maia, V. Diakoloukas, Y. Stylianou and V. Digalakis
Proc. ICASSP 2014, Florence, Italy, May 2014

Integrated Expression Prediction and Speech Synthesis from Text

L. Chen, M. Gales, N. Braunschweiler, M. Akamine and K. Knill
IEEE Journal Selected Topics in Signal Processing vol 8 no 2, April 2014

Building HMM-TTS Voices on Diverse Data

V. Wan, J. Latorre, K. Yanagisawa, N. Braunshweiler, L. Chen, M. Gales and M. Akamine
IEEE Journal Selected Topics in Signal Processing vol 8 no 2, April 2014

Intelligibility Enhancement of HMM-generated Speech in Additive Noise by Modifying Mel Cepstral Coefficients to Increase the Glimpse Proportion

C. Valentini-Botinhao, J. Yamagishi, S. King and R. Maia
Computer Speech & Language vol 28 no 2, March 2014

Automatic Detection of Inhalation Breath Pauses for Improved Pause Modelling in HMM-TTS

N. Braunschweiler and L. Chen
Proc. SSW8 2013, Barcelona, Spain, August 2013

Unsupervised Speaker and Expression Factorization for Multi-Speaker Expressive Synthesis of E-Books

L. Chen and N. Braunschweiler
Proc. Interspeech 2013, Lyon, France, August 2013

Minimum Mean Squared Error Based Warped Complex Cepstrum Analysis for Statistical Parametric Speech Synthesis

R. Maia, M. Gales, Y. Stylianou, M. Akamine
Proc. Interspeech 2013, Lyon, France, August 2013

Photo-Realistic Expressive Text to Talking Head Synthesis

V. Wan, R. Anderson, A. Blokland, N. Braunschweiler, L. Chen, B. Kolluru, J. Latorre, R. Maia, B. Stenger, K. Yanagisawa, Y. Stylianou, M. Akamine, M. Gales and R. Cipolla
Proc. Interspeech 2013, Lyon, France, August 2013

An Expressive Text-Driven 3D Talking Head

R. Anderson, B. Stenger, V. Wan and R. Cipolla
Proc. SIGGRAPH 2013, Anaheim, California, July 2013

Expressive Visual Text-To-Speech Using Active Appearance Models

R. Anderson, B. Stenger, V. Wan and R. Cipolla
CVPR, June 2013

Expressive Visual Text-to-Speech Using Active Appearance Models

R. Anderson, B. Stenger, V. Wan and R. Cipolla
Proc. CVPR 2013, Portland, Oregon, USA, June 2013

Complex Cepstrum for Statistical Parametric Speech Synthesis

R. Maia, M. Akamine and M. Gales
Speech Communication vol 55 no 5, June 2013

Integrated Automatic Expression Prediction and Speech Synthesis From Text

L. Chen, M. Gales, N. Braunschweiler, M. Akamine and K. Knill
Proc. ICASSP 2013, Vancouver, Canada, May 2013

Training a Super-Segmental Parametric F0 Model Without Interpolating F0

J. Latorre, M. Gales, K. Knill and M. Akamine
Proc. ICASSP 2013, Vancouver, Canada, May 2013

Complex Cepstrum Analysis Based on the Minimum Mean Squared Error

R. Maia, M. Akamine and M. Gales
Proc. ICASSP 2013, Vancouver, Canada, May 2013

Crowd Sourced Assessment of Speech Synthesis

S. Buchholz, J. Latorre and K. Yanagisawa
Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment, Chapter 7, March 2013

Exploring Rich Expressive Information from Audiobook Data Using Cluster Adaptive Training

L. Chen, M. Gales, V. Wan, J. Latorre, M. Akamine
Proc. Interspeech 2012, Portland, Oregon, USA, September 2012

Speech Factorization for HMM-TTS Based on Cluster Adaptive Training

J. Latorre, V. Wan, M. Gales, L. Chen, K. Chin, K. Knill and M. Akamine
Proc. Interspeech 2012, Portland, Oregon, USA, September 2012

Noise Compensation for Subspace Gaussian Mixture Models

L. Lu, K. Chin, A. Ghoshal and S. Renals
Proc. Interspeech 2012, Portland, Oregon, USA, September 2012

Analysis on the Importance of Short-Term Speech Parameterizations for Emotional Statistical Parametric Speech Synthesis

R. Maia and M. Akamine
Proc. Interspeech 2012, Portland, Oregon, USA, September 2012

C2H: a Computational Model of H&H-based Phonetic Contrast in Synthetic Speech

M. Nicolao, J. Latorre and R. Moore
Proc. Interspeech 2012, Portland, Oregon, USA, September 2012

Combining Multiple High Quality Corpora for Improving HMM-TTS

V. Wan, J. Latorre, K. Chin, L. Chen, M. Gales, H. Zen, K. Knill and M. Akamine
Proc. Interspeech 2012, Portland, Oregon, USA, September 2012

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

H. Zen, N. Braunschweiler, S. Buchholz, M. Gales, K. Knill, S. Krstulovic and J. Latorre
IEEE Trans. Audio, Speech and Language Processing vol 20 no 6, August 2012

Unsupervised Clustering of Emotion and Voice Styles for Expressive TTS

F. Eyben, S. Buchholz and N. Braunschweiler
Proc. ICASSP 2012, Kyoto, Japan, March 2012

Complex Cepstrum as Phase Information in Statistical Parametric Speech Synthesis

R. Maia, M. Akamine and M. Gales
Proc. ICASSP 2012, Kyoto, Japan, March 2012

Cepstral Analysis Based on the Glimpse Proportion Measure for Improving the Intelligibility of HMM-Based Synthetic Speech in Noise

C. Valentini-Botinhao, R. Maia, J. Yamagishi, S. King and H. Zen
Proc. ICASSP 2012, Kyoto, Japan, March 2012

Product of Experts for Statistical Parametric Speech Synthesis

H. Zen, M. Gales, Y. Nankaku and K. Tokuda
IEEE Trans. Audio, Speech and Language Processing vol 20 no 3, March 2012

Automatic Sentence Selection from Speech Corpora Including Diverse Speech for Improved HMM-TTS Synthesis Quality

N. Braunschweiler and S. Buchholz
Proc. Interspeech 2011, Florence, Italy, August 2011

Integrated Online Speaker Clustering and Adaptation

C. Breslin, K. Chin, M. Gales and K. Knill
Proc. Interspeech 2011, Florence, Italy, August 2011

Crowdsourcing Preference Tests, and How to Detect Cheating

S. Buchholz, J. Latorre
Proc. Interspeech 2011, Florence, Italy, August 2011

Multipulse Sequences for Residual Signal Modeling

R. Maia, H. Zen, K. Knill, M. Gales and S. Buchholz
Proc. Interspeech 2011, Florence, Italy, August 2011

Gaussian Process Experts for Voice Conversion

N. Pilkington, H. Zen and M. Gales
Proc. Interspeech 2011, Florence, Italy, August 2011

Switchable Primaries Using Shiftable Layers of Colour Filter Arrays

B. Sajadi, A. Majumder, K. Hiwada, A. Maki and R. Raskar
SIGGRAPH, August 2011

The Effect of Using Normalized Models in Statistical Speech Synthesis

M. Shannon, H. Zen and W. Byrne
Proc. Interspeech 2011, Florence, Italy, August 2011

Joint Uncertainty Decoding with Predictive Methods for Noise Robust Speech Recognition

H. Xu, M. Gales and K. Chin
IEEE Trans. Audio, Speech and Language Processing vol 19 no 6, August 2011

Context Adaptive Training with Factorised Decision Trees for HMM-Based Statistical Parametric Speech Synthesis

K. Yu, H. Zen, F. Mairesse and S. Young
Speech Communication vol 53 no 6, July 2011

Constrained Discriminative Mapping Transforms for Unsupervised Speaker Adaptation

L. Chen, M. Gales and K. Chin
Proc. ICASSP 2011, Prague, Czech Republic, May 2011

Rapid Joint Speaker and Noise Compensation for Robust Speech Recognition

K. Chin, H. Xu, M. Gales, C. Breslin and K. Knill
Proc. ICASSP 2011, Prague, Czech Republic, May 2011

Decision Tree-Based Context Clustering based on Cross Validation and Hierarchical Priors

H. Zen and M. Gales
Proc. ICASSP 2011, Prague, Czech Republic, May 2011

Development of US English Text-to-Speech Synthesizer using HMM-based Speech

M. Tamura, S. Krstulovic, T. Morinaka, R. Tokuda, H. Zen, M. Morita, T. Kagoshima and M. Akamine
Proc. Spring Meeting of the Acoustic Society of Japan, March 2011

Lightly Supervised Recognition for Automatic Alignment of Large Coherent Speech Recordings

N. Braunschweiler, M. Gales and S. Buchholz
Proc. Interspeech 2010, Makuhari, Chiba, Japan, September 2010

Prior Information for Rapid Speaker Adaptation

C. Breslin, K. Chin, M. Gales, K. Knill and H. Xu
Proc. Interspeech 2010, Makuhari, Chiba, Japan, September 2010

An Open Source HMM-based Text-to-Speech System for Brazilian Portuguese

I. Couto, N. Neto, V. Tadaiesky, A. Klautau and R. Maia
Proc. International Telecommunications Symposium (ITS 2010), September 2010

Training a Parametric-Based LogF0 Model with the Minimum Generation Error Criterion

J. Latorre, M. Gales and H. Zen
Proc. Interspeech 2010, Makuhari, Chiba, Japan, September 2010

Statistical Parametric Speech Synthesis with Joint Estimation of Acoustic and Excitation Model Parameters

R. Maia, H. Zen and M. Gales
Proc. Speech Synthesis Workshop SSW7, Kyoto, Japan, September 2010

Synthesis of Emotional Speech

N. Pilkington and H. ZenM. Schröder, F. Burkhardt and S. KrstulovicP. F. Alcantarilla, S. Stent, G. Ros, R. Arroyo and R. Gherardi
A Blueprint for Affective Computing: A sourcebook and manual, Chapter 5.2, September 2010

A Comparison of Pronunciation Modelling Approaches for HMM TTS

G. Webster, S. Krstulovic and K. Knill
Proc. Interspeech 2010, Makuhari, Chiba, Japan, September 2010

Context Adaptive Training with Factorized Decision Trees for HMM-Based Speech Synthesis

K. Yu, H. Zen, F. Mairesse and S. Young
Proc. Interspeech 2010, Makuhari, Chiba, Japan, September 2010

Speaker and Language Adaptive Training for HMM-Based Polyglot Speech Synthesis

H. Zen
Proc. Interspeech 2010, Makuhari, Chiba, Japan, September 2010

HMM-based Polyglot Speech Synthesis by Speaker and Language adaptive Training

H. Zen, N. Braunschweiler, S. Buchholz, K. Knill, S. Krstulovic and J. Latorre
Proc. Speech Synthesis Workshop SSW7, Kyoto, Japan, September 2010

Usage of an External Duration Model for HMM-Based Speech Synthesis

J. Latorre, S. Buchholz and M. Akamine
Proc. Speech Prosody 2010, Chicago, USA, May 2010

Annotating the Enron Corpus with Number Senses

S. Moore, S. Buchholz and A. Korhonen
Proc. Int. Conf. on Language Resources and Evaluation (LREC) 2010, Malta, May 2010

Automatic Feature Selection from a Large Number of Features for Phone Duration Prediction

G. Webster, S. Buchholz and J. Latorre
Proc. Speech Prosody 2010, Chicago, USA, May 2010

Statistical Parametric Speech Synthesis Based on Product of Experts

H. Zen, M. Gales, Y Nankaku and K. Tokuda
Proc. ICASSP 2010, Dallas, Texas, USA, March 2010

Improving Joint Uncertainty Decoding Performance by Predictive Methods of Noise Robust Speech Recognition

H. Xu, M. Gales and K. Chin
Proc. ASRU 2009, Merano, Italy, December 2009

Compression Techniques Applied to Multiple Speech Recognition Systems

C. Breslin, M. Stuttle and K. Knill
Proc. Interspeech 2009, Brighton, United Kingdom, September 2009

Improved Language Modelling Using Bag of Word Pairs

L. Chen, K. Chin and K. Knill
Proc. Interspeech 2009, Brighton, United Kingdom, September 2009

Number Sense Disambiguation

S. Moore, A. Korhonen and S. Buschholz
Proc. Pacling 2009, Sapporo, Japan, September 2009

Comparison of Estimation Techniques in Joint Uncertainty Decoding for Noise Robust Speech Recognition

H. Xu and K. Chin
Proc. Interspeech 2009, Brighton, United Kingdom, September 2009

Content Dependant Additive Log F0 Model for HMM-Based Speech Synthesis

P. F. Alcantarilla, S. Stent, G. Ros, R. Arroyo and R. Gherardi
Optica, vol 4, no 1, 19 January 2017 / arXiv

Techware: HMM-Based Speech Synthesis Resources

H. Zen and K. Tokuda
Signal Processing Magazine vol 26 no 4, July 2009

Joint Uncertainty Decoding with the Second Order Approximation for Noise Robust Speech Recognition

H. Xu and K. Chin
Proc. ICASSP 2009, Taipei, Taiwan, April 2009

Improving Japanese Language Models Using POS Information

L. Chen, H. Nagae and M. Stuttle
Proc. Interspeech 2008, Brisbane, Australia, September 2008

An Evaluation of Non-standard Features for Grapheme-to-Phoneme Conversion

G. Webster and N. Braunschweiler
Proc. Interspeech 2008, Brisbane, Australia, September 2008

Sentence-Based Emotion Classification for Text-to-Speech

E. Spyropoulou, S. Buchholz and S. Teufel
International Workshop on Computational Aspects of Affective and Emotional Interaction, July 2008

Comparing QMT1 and HMMs for the Synthesis of American English Prosody

S. Krstulovic, J. Latorre and S. Buchholz
Proc. Speech Prosody 2008, Campinas, Brazil, May 2008

Efficient Language Model Look-Ahead Probabilities Generation Using Lower Order LM Look-Ahead Information

L. Chen and K. Chin
Proc. ICASSP 2008, Las Vegas, USA, April 2008

The Toshiba Entry for the 2007 Blizzard Challenge

S. Buchholz, N. Braunscheiler, M. Morita and G. Webster
Proc. Blizzard Challenge 2007, Bonn, Germany, August 2007

How (Not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced

T. Lambert, N. Braunschweiler and S. Buchholz
Proc. ISCA Workshop on Speech Synthesis 2007, Bonn, Germany, August 2007

Sentence Level Intelligibility Evaluation for Mandarin Text-to-Speech Systems Using Semantically Unpredictable Sentences

J. Li, D. Sityaev and J. Hao
Proc. Interspeech 2007, Antwerp, Belgium, August 2007

Some Aspects of Prosody of Friendly Formal and Friendly Informal Speaking Styles

D. Sityaev, G. Webster, N. Brauchweiler, S. Buchholz and K. Knill
Proc. ICPhS XVI, Saarbrucken, Germany, August 2007

Acoustic Model Development Using HLDA for Robust Embedded In-car Speech Recognition

A. Abella, J. Nealand and K. Knill
Proc. One Day Meeting for Young Speech Researchers, London, UK, April 2007

Comparison of the ITU-T P.85 Standard to Other Methods for the Evaluation of Text-to-Speech Systems

D. Sityaev, K. Knill and T. Burrows
Proc. Interspeech 2006, Pittsburgh, PA, USA, September 2006

ConNLL-X Shared Task on Multilingual Dependency Parsing

S. Buchholz and E. Marsi
Proc. Conf. Computational Natural Language Learning (CoNLL-X), New York City, USA, June 2006

Reconstruction in the Round Using Photometric Normals and Silhouettes

G. Vogiatzis, C. Hernández and R. Cipolla
CVPR, June 2006

Adaptation of Prosodic Phrasing Models

P. Bell, T. Burrows and P. Taylor
Proc. Speech Prosody 2006, Dresden, Germany, May 2006

The Prosodizer - Automatic Prosodic Annotations of Speech Synthesis Databases

N. Braunschweiler
Proc. Speech Prosody 2006, Dresden, Germany, May 2006

Quality Control of Treebanks: Documenting, Converting and Patching

S. Buchholz and D. Green
Proc. LREC Workshop 2006, Genoa, Italy, May 2006

Analysis and Modelling of Question Intonation in American English

D. Sityaev, T. Burrows, P. Jackson and K. Knill
Proc. Speech Prosody 2006, Dresden, Germany, May 2006

Robust Endpoint Detection for Speech Recognition Based on Discriminative Feature Extraction

K. Yamamoto, F. Jabloun, K. Reinhard and A. Kawamura
Proc. ICASSP 2006, Toulouse, France, May 2006

Investigating Prosodic Modifications for Polyglot Text-to-Speech Synthesis

P. Olaszo, T. Burrows and K. Knill
Proc. Multiling 2006, Stellenbosch, South Africa, April 2006

A Study on Endpoint Detection for Speech Recognition Based on Discriminative Feature Extraction

K. Yamamoto, F. Jabloun, K. Reinhard and A. Kawamura
Information Processing Society of Japan Audio Language Information Processing, December 2005

Intonational Sequences in Tuscan Italian

J. Bishop, M. Peake and D. Sityaev
Proc. Interspeech 2005, Lisbon, Portugal, September 2005

Combining Models of Prosodic Phrasing and Pausing

T. Burrows, P. Jackson, K. Knill and D. Sityaev
Proc. Interspeech 2005, Lisbon, Portugal, September 2005

Influence of Syntax on Prosodic Boundary Prediction

T. Ingulfsen, T. Burrows and S. Buchholz
Proc. Interspeech 2005, Lisbon, Portugal, September 2005

A Comparison of Methods for Speaker-Dependant Pronunciation Tuning for Text-to-Speech Synthesis

G. Webster, T. Burrows and K. Knill
Proc. Interspeech 2005, Lisbon, Portugal, September 2005

Improving Letter-to-Pronunciation Accuracy with Automatic Morphologically Based Stress Prediction

G. Webster
Proc. Interspeech 2004, Jeju Island, Korea, October 2004

No publications available for this year.

EAI Publications

Filters

Clear filter

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2024

ReCoRe: Regularized Contrastive Representation Learning of World Model

DiaLoc: An Iterative Approach to Embodied Dialog Localization

2023

Cumulative Attention based streaming transformer ASR with internal language model joint training and rescoring

Frame-wise and overlap-robust speaker embeddings for meeting diarization

On the effectiveness of monoaural target source extraction for distant end-to-end automatic speech recognition

Non-Autoregressive End-to-End Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding

2022

Multiple-hypothesis RNN-T loss for unsupervised fine-tuning and self-training of neural transducer

Self-regularised minimum latency training for streaming transformer-based speech recognition

Combining structured and unstructured knowledge in an interactive search dialogue system

On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training

Comparing human emotion perception and automatic emotion recognition of user turns in human-machine dialogues

Opening up minds with argumentative dialogues

Monoaural source separation: from anechoic to reverberant environments

Transformer-based streaming ASR with cumulative attention

Factors in Emotion Recognition with Deep Learning Models Using Speech and Text on Multiple Corpora

Transformer-based streaming ASR with cumulative attention

Speaker reinforcement using target source extraction for robust automatic speech recognition

2021

Towards handling unconstrained user preferences in dialogue

QTMM2012c+: a queryable empirically-grounded resource of dialogue with argumentation

End-to-end neural based modification of noisy speech for speech-in-noise intelligibility improvement

Dialogue strategy adaptation to new action sets using multi-dimensional modelling

A study on cross-corpus speech emotion recognition and data augmentation

Improving HS-DACS based streaming transformer ASR with deep reinforcement learning

Teacher-student MixIT for unsupervised and semi-supervised speech separation

Head-Synchronous Decoding for Transformer-based Streaming ASR

Transformer-based Online Speech Recognition with Decoder-End Adaptive Computation Steps

An Investigation into the Multi-Channel Time Domain Speaker Extraction Network

Multiple-Hypothesis CTC-based Semi-Supervised Adaptation of End-to-End Speech Recognition

Action State Update Approach to Dialogue Management

Train your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

2020

Selective Adaptation of End-to-End Speech Recognition using Hybrid CTC/Attention Architecture for Noise Robustness

Open-domain Topic Identification of Out-of-domain Utterances using Wikipedia

Toshiba’s Speech Recognition System for the CHiME 2020 Challenge

Towards a speaker diarization system for the CHiME 2020 dinner party transcription

The ISO Standard for Dialogue Act Annotation, Second Edition

Learning Noise Invariant Features through Transfer Learning for Robust End-to-End Speech Recognition

On End-to-End Multi-Channel Time Domain Speech Separation in Reverberant Environments

2019

Robust Belief State Space Representation for Statistical Dialogue Managers using Deep Autoencoders

An Investigation into the Effectiveness of Enhancement in ASR Training and Test for CHiME-5 Dinner Party Transcription

Crowd-sourced Collection of Task-Oriented Human-Human Dialogues in a Multi-Domain Scenario

Prediction of User Emotion and Dialogue Success Using Audio Spectrograms and Convolutional Neural Networks

Subband Temporal Envelope Features and Data Augmentation for End-To-End Recognition of Distant Conversational Speech

An Unsupervised Learning Approach to Neutral-Net-Supported WPE Dereverberation

On Reducing the Effect of Speaker Overlap for CHiME-5

2018

Prediction of Dialogue Success with Spectral and Rhythm Acoustic Features using DNNs and SVMs

Comparison of an End-To-End Trainable Dialogue System with a Modular Statistical Dialogue System

Weighting Time-Frequency Representation of Speech Using Auditory Saliency for Automatic Speech Recognition