AI Inference Pipelines – Building Low-Latency Systems With gRPC - Akshat Sharma, Deskree

Feb 5, 2026Channel
AI Analysis
Data from YouTube Data API v3Updated Just now

Video Overview

Video Details

Published4 months ago
Duration18:41
Video IDISLGPZ493MI
Languageen
CategoryScience & Technology
PrivacyPublic
Made for KidsNo
Video TypeRegular Video

Performance Metrics

Views68
Likes2
Comments0
Engagement Rate2.94%
Likes per 100 views2.94
Comments per 1K views0.00

Description

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands (23-26 March, 2026). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io AI Inference Pipelines – Building Low-Latency Systems With gRPC - Akshat Sharma, Deskree Ever tried running an AI model in production, only to see it slow down when every millisecond matters? From fraud detection to medical imaging, real-time AI systems can’t afford delays — and that’s where gRPC shines. In this session, I’ll share how we built AI inference pipelines using gRPC to handle low-latency, high-throughput communication across services. I’ll walk through the journey — what worked, what didn’t, and the lessons learned along the way. We’ll cover the architecture, the tricky performance bottlenecks, and how we scaled inference so it could keep up with real-world demand. By the end, you’ll leave with practical tips on designing fast, reliable, and production-ready AI systems powered by gRPC.

Related Videos

More videos from CNCF [Cloud Native Computing Foundation]