Leveraging Loanword Constraints for Improving Machine Translation in Low-resource Settings

Dec 2, 2025Channel
AI Analysis
Data from YouTube Data API v3Updated Just now

Video Overview

Video Details

Published6 months ago
Duration45:03
Video IDpqClPCtIvQ0
Languageen
CategoryScience & Technology
PrivacyPublic
Made for KidsNo
Video TypeRegular Video

Performance Metrics

Views177
Likes8
Comments0
Engagement Rate4.52%
Likes per 100 views4.52
Comments per 1K views0.00

Description

Translating from high-resource to low-resource languages like Emakhuwa remains a challenge due to limited parallel data, orthographic variation, and frequent loanwords and code-switching. In this talk Felermino will discuss how to apply lexicon-guided neural machine translation, integrating bilingual dictionaries, and loanword mappings into the training process to address this challenge. Our method uses over 8,000 dictionary entries and 12,000 loanword mappings to build sentence-specific glossaries incorporated via input augmentation. Experiments on FLORES+ show improved lexical coverage, reduced inconsistencies, and more contextual accurate translations. Suggesting a promising direction for low-resource MT by bridging data scarcity and vocabulary gaps through structured lexical integration. Learn more about Microsoft Research Lab – Africa, Nairobi: https://www.microsoft.com/en-us/research/lab/microsoft-research-lab-africa-nairobi/seminars/

Related Videos

More videos from Microsoft Research