Dialogue Systems

The Dialogue group works on fundamental research related to the modelling of human-machine communication. Our aim is to develop methods that enable users to complete tasks in collaboration with the informational and embodied automatic agents using natural spoken and multimodal interfaces. Our current research areas include natural language interpretation for dialogue, statistical dialogue management, emotion detection from multimodal input, domain adaptation, and the use of unstructured data in dialogue. In our research we explore supervised, unsupervised, and reinforcement learning methods, currently focusing on the application of generative adversarial networks for dialogue tasks.

We work on all aspects of statistical spoken dialogue systems. For interpretation in dialogue, we developed the Action State Update (ASU) approach, a statistical method that handles references in user utterances without the need for a domain-specific Natural Language Understanding component. We use a multi-dimensional approach to dialogue management, aiming to support more natural interactions and to enable more efficient adaptation to new domains. We detect user emotions and consider them in dialogue response generation.

Interpretation without a domain-specific Natural Language Understanding (NLU)

Interpretation in a dialogue system processes a user utterance and updates the dialogue state, which is used by the policy to decide on the next system action. Traditionally, interpretation relies on detection of domain-specific semantics, including intents, entities, and relations, requiring a task-specific annotated dataset for training an NLU model. In contrast, Action State Update (ASU) approach is centred on user actions. We discretize user actions based on the domain structure and train a binary action detection classifier eliminating the need for costly domain-specific semantic annotations.

Multi-dimensional dialogue management

The action selection component of a dialogue system decides what the most appropriate response to give to the user is, given the current dialogue state. This decision is driven by a hand-coded or trained dialogue policy. In statistical dialogue systems, the dialogue policy is typically trained using Reinforcement Learning, often in interaction with a simulated user. In our approach to action selection, we distinguish between different aspects, or ‘dimensions’, of the dialogue process that can be addressed simultaneously. We therefore implement multiple agents that each focus on one of these dimensions, and train their associated policies accordingly. As some of these dimensions can be considered task- and/or domain-independent, their policies can be re-used and adapted to new tasks and/or domains.

User state estimation in dialogue systems

The role of user state estimation becomes more and more important for modern day dialogue systems aiming to be more adaptive to user and situation. By considering user emotion and ongoing interaction quality for response generation, dialogue systems are able to be more engaging and likely to be more user friendly. To estimate emotion and interaction quality we are using deep learning methods and utilize multiple input modalities such as speech, text and video for improved estimation performance.

Latest Publications

Cumulative Attention based streaming transformer ASR with internal language model joint training and rescoring

M. Li, C-T Do and R. Doddipatla / Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes Island, Greece, June 2023

Frame-wise and overlap-robust speaker embeddings for meeting diarization

T. Cord-Landwehr, C. Boeddeker, T-C Zorilă, R. Doddipatla and R. Haeb-Umbach / Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes Island, Greece, June 2023

On the effectiveness of monoaural target source extraction for distant end-to-end automatic speech recognition

T-C Zorilă and R. Doddipatla / Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes Island, Greece, June 2023

View more