From Batch to AI-Native: How Volcano 1.14 Unifies Training, Inference & Agent Workloads

Apr 7, 2026Channel
AI Analysis
Data from YouTube Data API v3Updated Just now

Video Overview

Video Details

Published2 months ago
Duration7:34
Video ID1E9drfYvHBg
Languageen
CategoryScience & Technology
PrivacyPublic
Made for KidsNo
Video TypeRegular Video

Performance Metrics

Views392
Likes11
Comments0
Engagement Rate2.81%
Likes per 100 views2.81
Comments per 1K views0.00

Description

Running massive AI training jobs, LLM inference workloads, and bursty AI agents on the same Kubernetes cluster is a recipe for wasted GPU capacity, fragmented resource allocation, and skyrocketing cloud costs. The problem isn't just deployment—it's intelligent scheduling that prevents idle resources while maintaining low-latency performance for unpredictable agent workloads. Jesse Stutler, Maintainer at Volcano, explains how Volcano 1.14 is evolving from a batch scheduling tool into an AI-native unified scheduling platform. With its new multi-scheduler architecture, topology-aware scheduling, and KV cache awareness, Volcano handles the full AI lifecycle—training, inference, and agents—on a single cluster without sacrificing performance or burning through GPU budgets. Key Topics Covered: Multi-scheduler architecture with dynamic sharding for batch and agent workloads Topology-aware scheduling for hyper-node bin packing and network domain optimization AgentCube: Kubernetes-native platform for bursty, short-lived AI agent sessions Katana: AI inference routing with KV cache awareness, prefix caching, and speculative decoding Colocation strategies using cgroup v2 to increase deployment density and GPU utilization Read the full story & transcript at www.tfir.io #Kubernetes #AIScheduling #Volcano #GPUOptimization #KubeCon #LLMInference #AIAgents #CloudCost #MachineLearning #OpenSource

Related Videos

More videos from The Linux Foundation