Efficient Distributed Orthonormal Optimizers for Large-Scale Training

Mar 6, 2026Channel
AI Analysis
Data from YouTube Data API v3Updated Just now

Video Overview

Video Details

Published2 months ago
Duration56:20
Video IDY6l_7SdVDX4
Languageen
CategoryScience & Technology
PrivacyPublic
Made for KidsNo
Video TypeRegular Video

Performance Metrics

Views173
Likes10
Comments1
Engagement Rate6.36%
Likes per 100 views5.78
Comments per 1K views5.78

Description

Speaker: Kwangjun Ahn, Microsoft Research I delivered a 50-minute technical talk on recent advances in orthonormal update methods for large-scale AI model training. This topic has been rapidly gaining attention in the community, emerging as a strong successor to AdamW following the success of orthonormal optimizers in training production-scale models such as Kimi-K2 and GLM-4.5. The talk centered on the design and practice of orthonormal updates, with a focus on optimizers such as Muon and Dion2. While I briefly discussed their theoretical foundations, the emphasis was on practical usage: how to integrate these optimizers into modern training pipelines, interpret their algorithmic components, and leverage the implementation guidelines provided in our open-source codebase at https://github.com/microsoft/dion

Related Videos

More videos from Microsoft Research