Deploying Local LLM but It Is Slow? Here's How to Fix It (Hopefully) | LLMOps with vLLM

Nov 7, 2025Channel
AI Analysis
Data from YouTube Data API v3Updated Just now

Video Overview

Video Details

Published7 months ago
Duration6:56
Video IDBnZoF0oPILU
Languageen
CategoryEducation
PrivacyPublic
Made for KidsNo
Video TypeRegular Video

Performance Metrics

Views119
Likes16
Comments2
Engagement Rate15.13%
Likes per 100 views13.45
Comments per 1K views16.81

Description

Ever tried running a Large Language Model (LLM) on your server, only to be disappointed by slow performance (but you have good-enough GPUs)? In this video, you'll learn about vLLM, a library used for production deployments of LLMs. We'll do a head-to-head comparison - a Hugging Face transformers pipeline versus a model running using vLLM. You'll see the difference in inference speed and understand why tools like vLLM are essential for building real-world, scalable AI applications. vLLM: https://docs.vllm.ai/en/latest/ AI Academy: https://www.mlexpert.io/ LinkedIn: https://www.linkedin.com/in/venelin-valkov/ Follow me on X: https://twitter.com/venelin_valkov Discord: https://discord.gg/UaNPxVD6tv Subscribe: http://bit.ly/venelin-subscribe GitHub repository: https://github.com/curiousily/AI-Bootcamp 👍 Don't Forget to Like, Comment, and Subscribe for More Tutorials! Join this channel to get access to the perks and support my work: https://www.youtube.com/channel/UCoW_WzQNJVAjxo4osNAxd_g/join

Related Videos

More videos from Venelin Valkov