Distant conversational speech recognition: Challenges and Opportunities

Oct 17, 2025Channel
AI Analysis
Data from YouTube Data API v3Updated Just now

Video Overview

Video Details

Published7 months ago
Duration1:28:41
Video IDVn7EunGIObE
Languageen
CategoryScience & Technology
PrivacyPublic
Made for KidsNo
Video TypeRegular Video

Performance Metrics

Views150
Likes1
Comments0
Engagement Rate0.67%
Likes per 100 views0.67
Comments per 1K views0.00

Description

Host: Sunit Sivasankaran, Microsoft Research Speaker: Dr. Samuele Cornell, Carnegie Mellon University State-of-the-art ASR systems excel on close-talk benchmarks but struggle with far-field conversational speech, where error rates remain above 20%. Current benchmark datasets inadequately assess generalization across domains and real-world conditions, often relying on oracle segmentation that yields overly optimistic results. Distant ASR (DASR) faces unique challenges including overlapping speech, varied recording setups, and dynamic speaker interactions that significantly complicate system development. Despite these difficulties, spontaneous conversational speech represents the next frontier for developing more human-like AI agents capable of natural multi-party communication. This talk presents recent advances in DASR through three interconnected efforts: (1) the CHiME-7 and CHiME-8 DASR challenges, which established rigorous benchmarks for generalizable robust meeting transcription, (2) end-to-end joint modeling that unifies speaker diarization and speech recognition into a single framework, moving beyond traditional pipeline approaches, and (3) synthetic data generation leveraging large language models and text-to-speech systems to create realistic multi-speaker training data at scale.

Related Videos

More videos from Microsoft Research