Distant conversational speech recognition: Challenges and Opportunities
Oct 17, 2025•Channel
AI Analysis
Data from YouTube Data API v3•Updated Just now
Video Overview
Video Details
Published7 months ago
Duration1:28:41
Video IDVn7EunGIObE
Languageen
CategoryScience & Technology
PrivacyPublic
Made for KidsNo
Video TypeRegular Video
Performance Metrics
Views150
Likes1
Comments0
Engagement Rate0.67%
Likes per 100 views0.67
Comments per 1K views0.00
Description
Host: Sunit Sivasankaran, Microsoft Research
Speaker: Dr. Samuele Cornell, Carnegie Mellon University
State-of-the-art ASR systems excel on close-talk benchmarks but struggle with far-field conversational speech, where error rates remain above 20%. Current benchmark datasets inadequately assess generalization across domains and real-world conditions, often relying on oracle segmentation that yields overly optimistic results. Distant ASR (DASR) faces unique challenges including overlapping speech, varied recording setups, and dynamic speaker interactions that significantly complicate system development. Despite these difficulties, spontaneous conversational speech represents the next frontier for developing more human-like AI agents capable of natural multi-party communication. This talk presents recent advances in DASR through three interconnected efforts: (1) the CHiME-7 and CHiME-8 DASR challenges, which established rigorous benchmarks for generalizable robust meeting transcription, (2) end-to-end joint modeling that unifies speaker diarization and speech recognition into a single framework, moving beyond traditional pipeline approaches, and (3) synthetic data generation leveraging large language models and text-to-speech systems to create realistic multi-speaker training data at scale.