Deploying Local LLM but It Is Slow? Here's How to Fix It (Hopefully) | LLMOps with vLLM
Nov 7, 2025•Channel
AI Analysis
Data from YouTube Data API v3•Updated Just now
Video Overview
Video Details
Published7 months ago
Duration6:56
Video IDBnZoF0oPILU
Languageen
CategoryEducation
PrivacyPublic
Made for KidsNo
Video TypeRegular Video
Performance Metrics
Views119
Likes16
Comments2
Engagement Rate15.13%
Likes per 100 views13.45
Comments per 1K views16.81
Description
Ever tried running a Large Language Model (LLM) on your server, only to be disappointed by slow performance (but you have good-enough GPUs)? In this video, you'll learn about vLLM, a library used for production deployments of LLMs.
We'll do a head-to-head comparison - a Hugging Face transformers pipeline versus a model running using vLLM. You'll see the difference in inference speed and understand why tools like vLLM are essential for building real-world, scalable AI applications.
vLLM: https://docs.vllm.ai/en/latest/
AI Academy: https://www.mlexpert.io/
LinkedIn: https://www.linkedin.com/in/venelin-valkov/
Follow me on X: https://twitter.com/venelin_valkov
Discord: https://discord.gg/UaNPxVD6tv
Subscribe: http://bit.ly/venelin-subscribe
GitHub repository: https://github.com/curiousily/AI-Bootcamp
👍 Don't Forget to Like, Comment, and Subscribe for More Tutorials!
Join this channel to get access to the perks and support my work:
https://www.youtube.com/channel/UCoW_WzQNJVAjxo4osNAxd_g/join