Embodied AI

Embodied AI is at the forefront of building an ever-improving foundation of automated industrial Artificial Intelligence though Natural Interaction in Physical Space and with Language.

We target the creating of AI that can be operated easily by general users and continuously pushes the boundary of task complexity. We facilitate easy deployment of equipment to new sites and tasks by learning and using problem/task/environment-specific knowledge and skills.

Fast Adaptation allows for minimal effort/cost to deploy systems in new environments through interaction with human and the environment.

Continuous Learning expands functionality by generalizing past experiences into “common sense” through knowledge extraction.

Embodied AI refers to agent-based AI systems that can manipulate objects and communicate with people, assisting us in physical tasks. These agents learn interactively from users in their environment, adapting quickly, and continuously expanding their capabilities. CRL’s research will strive to yield technologies capable of natural interaction with the physical environment and human operators, building a dynamic catalog of skills that encompasses perception, reasoning, and action.

The classical paradigm of perception and manipulation systems is to understand to act – e.g. segmentation is used to detect pedestrians so that ADAS can operate. In the new framework of Embodied AI, we instead act to understand. Specifically, Embodied AI has fast adaptation and continuous learning through task execution at its core. The system interacts to improve perception, reasoning, and its action towards future task completion. Human interaction will enable seamless reprogramming (teaching) of physical systems, while autonomous actions in physical space enable us to learn about new environments. The need for expert knowledge to redeploy classical AI to new tasks is eliminated.

Embodied AI is critical to addressing the real-world challenges of the next generation industry. Adaptation and resilience is at the core of embodied AI as it ensuring a workforce that can truly leverage the power of AI through simple human and machine interaction in ever changing tasks and deployments. Adaptation and learning also ensures sustainability as we constantly improve the applicability of existing hardware through ever learning software algorithms. CRL’s Embodied AI will enable a versatile assistive system for a multitude of tasks. Since Embodied AI focuses on the software, we can learn from many modalities and deployments ensuring a seamless integration of disparate systems.

Language & Interaction Group

The Langauge & Interaction Group (LIG) focuses on learning from interaction with natural language. Dialogue driven deployment-specific interaction with human operators allow for long-horizon planning, refinement and learning with human in the loop. Fast adaptation utilizes local knowledge bases to refine the execution.

Vision & Learning Group

The Vision & Learning Group (VLG) focuses on learning from interaction in physical environments. Complex and safe manipulation and navigation technology leverage precise 3D geometry and scene understanding in conjunction with strong world-aware action selection frameworks. Learned concepts are effectively transfered to new domains.

Latest Publications

Information contained in news and other announcements is current on the date of posting, but subject to change without notice.

Embodied AI

2025

NPLMV-PS: Neural Point-Light Multi-View Photometric Stereo

F Logothetis, I Budvytis, R Cipolla

WACV 2025

2025

Recurrent Reinforcement Learning with Memoroids

S Morad, C Lu, R Kortvelesy, S Liwicki, J Foerster, A Prorok

NeurIPS 2024

2025

ToolReAGt: Tool Retrieval for LLM-based Complex Task Solution via Retrieval Augmented Generation

Norbert Braunschweiler, Rama Doddipatla, Tudor-Catalin Zorila

Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM) at ACL 2025, Vienna, Austria

2025

Conditional Multi-Stage Failure Recovery for Embodied Agents

Youmna Farag, Svetlana Stoyanchev, Mohan Li, Simon Keizer, Rama Doddipatla

First Workshop for Research on Agent Language Models (REALM) at ACL 2025, Vienna, Austria

2025

On the importance of chain-of-thought in distilling reasoning capability from large language models

Cong Thanh Do, Yuan Li, Simon Keizer, Mohan Li, Rama Doddipatla, Kate Knill

Proc. UK and Ireland Speech Workshop 2025, June 16-17, 2025, York, UK

2024

Utilising unsupervised text-to-speech synthesis for data augmentation to improve acccented speech recognition

Cong Thanh Do, Shuhei Imai, Rama Doddipatla, Thomas Hain

Proc. UK and Ireland Speech Workshop 2024, July 01-02, 2024, Cambridge, UK

2024

Report on the 8th Workshop on Search-Oriented Conversational Artificial Intelligence

A. Frummet, A. Papenmeier, M. Fröbe, J. Kiesel, V. Adlakha, N. BraunschweilerT, M.Dubiel, S. Ghosh, M. Gohsen, C. Kreutz, M. Momeni, M. Nilles, S. P. Cherumanal, A.Pirmoradi, P. Thomas, J. R. Trippas, I. Zelch, and O. Zendel.

Report published in SIGIR Forum 58, 1 (June 2024), 1–12.

2024

Advancing Faithfulness of Large Language Models in Goal-Oriented Dialogue Question Answering

Abigail Sticha, Norbert Braunschweiler, Rama Sanand Doddipatla, Kate M Knill

CUI ’24: Proceedings of the 6th ACM Conference on Conversational User Interfaces

2024

Improving Retrieval-Augmented Response Generation in Goal-Oriented Dialogue Question Answering

N. BraunschweilerT, A. Sticha, R. Doddipatla, K. Knill

Presented at the UK-Ireland Speech Workshop (UKISpeech 2024) in Cambridge, UK

2024

A Neural Height-Map Approach for the Binocular Photometric Stereo Problem

F Logothetis, I Budvytis, R Cipolla

WACV 2024

2024

An LLM Feature-based Framework for Dialogue Constructiveness Assessment

Lexin Zhou, Youmna Farag, Andreas Vlachos.

Proc. of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), July 2024

2024

Annotation Needs for Referring Expressions in Pair-Programming Dialogue

L. Domingo, P. Piwek, M. Wermelinger, S. Stoyanchev

28th Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL 2024).

2024

Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding

M. Li, S. Keizer and R. Doddipatla

Proc. of Interspeech 2024, Kos Island, Greece, September 2024

2024

Semantic Map-based Generation of Navigation Instructions’

C. Li, C. Zhang , S. Teufel, R. Doddipatla , S.Stoyanchev

Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024

2024

Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

T. Cord-Landwehr, C. Boeddeker, T-C Zoril?a, R. Doddipatla and R. Haeb-Umbach

2024 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024), Seoul, Korea, April 2024

Embodied AI

Language & Interaction Group

Vision & Learning Group

Latest Publications

NPLMV-PS: Neural Point-Light Multi-View Photometric Stereo

Recurrent Reinforcement Learning with Memoroids

ToolReAGt: Tool Retrieval for LLM-based Complex Task Solution via Retrieval Augmented Generation

Conditional Multi-Stage Failure Recovery for Embodied Agents

On the importance of chain-of-thought in distilling reasoning capability from large language models

Utilising unsupervised text-to-speech synthesis for data augmentation to improve acccented speech recognition

Report on the 8th Workshop on Search-Oriented Conversational Artificial Intelligence

Advancing Faithfulness of Large Language Models in Goal-Oriented Dialogue Question Answering

Improving Retrieval-Augmented Response Generation in Goal-Oriented Dialogue Question Answering

A Neural Height-Map Approach for the Binocular Photometric Stereo Problem

An LLM Feature-based Framework for Dialogue Constructiveness Assessment

Annotation Needs for Referring Expressions in Pair-Programming Dialogue

Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding

Semantic Map-based Generation of Navigation Instructions’

Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

Contact Us

Vacancies

Newsroom

Global R & D

Cambridge Research Laboratory

Digital Transformation

Sustainable energy solutions

Research & Development

Semiconductor & Storage Solutions

About Toshiba

Quantum Technology

Newsroom

Other Business Divisions