Machine Learning Project For Beginners with XGBoost + NVIDIA GPU 🤖🧠

Oct 19, 2025Channel
AI Analysis
Data from YouTube Data API v3Updated Just now

Video Overview

Video Details

Published8 months ago
Duration34:08
Video IDF_8RKstP2X8
Languageen
CategoryScience & Technology
PrivacyPublic
Made for KidsNo
Video TypeRegular Video

Performance Metrics

Views4.7K
Likes384
Comments60
Engagement Rate9.48%
Likes per 100 views8.20
Comments per 1K views12.81

Description

Ever wondered what makes people tip more in taxis? 🚕💵 In this hands-on machine learning project, we’ll build a complete workflow on real-world NYC data — cleaned, engineered, and trained entirely on GPU using XGBoost CUDA and cuDF Pandas! 🐼 (🚨No GPU?🚨 I’ll show you how to use one for free on Google Colab! 😉) You’ll see how professionals approach problems, handle massive data, and fix memory errors - designing real data-science pipelines step by step! 😎 By the end, you’ll have a meaningful project that’s fun to build, technically impressive, and looks perfect on your portfolio!! 🤩 Join me on this adventure — and learn how to think like a pro-level data scientist. 💡 What You’ll Learn - Handling Real-World Datasets: Cleanup, Missing Values, Anomalies, Aggregation. 📊 - Solving memory limitations and runtime crushes with cuDF Pandas + RMM. 💾 - Accelerating machine learning with XGBoost on NVIDIA GPUs. 🤖 - Evaluate your model’s performance — and keep making it smarter! 💪🤓 - And most importantly — develop the mindset of a data scientist, solving problems instead of guessing. 🔎 🧠 What Makes This Project Different This isn’t another “beginner demo” — it’s a real workflow based on real data and real problems. You’ll experience the same challenges professionals face: huge sloppy datasets, missing labels, CPU and GPU memory limits — all explained step by step, in simple terms. I’ll show you why we make each decision, not just how to code it — so you learn to think, debug, and reason like a pro. 🔗 Important Links ------------------------------------------------ 🔹Download Tutorial Code and Smaller Dataset from GitHub: https://github.com/MariyaSha/nyc_taxi_xgboost_lab 🔹 Download Full Dataset from NYC Open Data: https://data.cityofnewyork.us/Transportation/2023-Yellow-Taxi-Trip-Data/4b4i-vvec/about_data 🔹RAPIDS Installation Guide: https://docs.rapids.ai/install/ 🔹Official NVIDIA Google Colab Notebook - 🧐 VERY ADVANCED 🧐: https://colab.research.google.com/drive/1vlzvB981pej2RlKmXBUF1CNzyxl8YpJg 📽️ Important Tutorials ------------------------------------------------ ⭐ WSL + Conda Setup: https://youtu.be/luM5kwH6tjQ ⭐ Machine Learning with Scikit-Learn: https://youtu.be/-IvNzmrcyUM ⭐ cuDF Pandas For Beginners: https://youtu.be/9KsJRyZJ0vo ⭐ What is CUDA? https://youtu.be/r9IqwpMR9TE ⏰ Time Stamps ------------------------------------------------ 01:08 - Download Dataset 01:43 - Solving Big Data Problems with GPU Processing 02:46 - Google Colab Setup with Free T4 GPU 03:02 - Local Setup with NVIDIA GPU 03:43 - RAPIDS Installation Guide 05:07 - Solving Jupyter Kernel Crash with cuDF Pandas 05:29 - Handling Missing Values 05:53 - Detect Missing Values 06:29 - Replace with Zero 07:31 - Replace with Mean 08:57 - Investigate Columns with Ambiguous Names 11:21 - Drop Columns (If No Other Option) 12:01 - Split Data For Training & Testing 12:07 - Shuffle Data 13:39 - Features & Targets Split 14:02 - Train & Test Split 16:20 - Load XGBoost Model on GPU 17:55 - Train XGBoost Model 18:08 - Test XGBoost Model and Get Predictions 18:45 - Solve ValueError : DataFrame.dtypes must be int float bool or category 20:15 - Evaluate Trained Model 22:39 - Data Optimization & Anomalies 22:41 - Detect Data Anomalies with Aggregation 23:47 - Solve XGBoostError : No GPU Memory Left with RMM 25:04 - Handle Negative Charges and Unrealistic Distances 28:19 - Detect and Handle Unrealistic Transactions 30:28 - Second Train Run on Optimized Data 31:45 - Best Practices 31:45 - Plot Training Results & Feature Importance 32:17 - Hyperparameter Tuning 32:49 - Date Extraction : From String to Int or Category 33:05 - K-Fold Validation 33:45 - Thanks for Watching! 🚀 Environment Setup ------------------------------------------------ You can run this project in two ways, coding along with me: 1️⃣ Google Colab: - Change your runtime to T4 GPU. - Use smaller version of the NYC Taxi dataset (5 million rows). Download above 👆 2️⃣ Local setup: - Make sure you have a CUDA compatible GPU. - Use WSL and Minforge/Conda (⚠️MUST! ⚠️). - Use current command from RAPIDS Installation Guide for your setup (⚠️MUST! ⚠️). - Use the full version of the NYC Taxi dataset (38 million rows). Download above 👆 💻 Tutorial Code ------------------------------------------------ 📌 Remove all the rows that have negative numbers: data = data[~data.select_dtypes("number").lt(0).any(axis=1)] 📌 Solve "XGBoostError: No GPU memory is left" and kernel crashes: import rmm rmm.reinitialize(pool_allocator=True, initial_pool_size="8GB") #MachineLearning #DataScience #Python #BigData #GPU #NVIDIA #RAPIDS #DataAnalysis #DataCleaning #PythonTutorial #AI #pythonprogramming

Related Videos

More videos from Python Simplified