Deep Dive: Teaching Arcee Trinity Mini to Read Medical Research with RLVR and GRPO
Mar 3, 2026•Channel
AI Analysis
Data from YouTube Data API v3•Updated Just now
Video Overview
Video Details
Published3 months ago
Duration44:09
Video IDzP7s_IdrVRs
Languageen
CategoryScience & Technology
PrivacyPublic
Made for KidsNo
Video TypeRegular Video
Performance Metrics
Views307
Likes5
Comments0
Engagement Rate1.63%
Likes per 100 views1.63
Comments per 1K views0.00
Description
Bojan Jakimovski, an ML engineer, took Arcee AI's open-source Trinity Mini model and turned it into a biomedical specialist — extracting drug-protein relationships from scientific papers. No massive team. No million-dollar budget. Just open weights, a clever training technique called RLVR, and a weekend of GPU time.
In this video, I break down exactly how it works: the Mixture of Experts architecture behind Trinity Mini, why Reinforcement Learning with Verifiable Rewards (RLVR) beats traditional fine-tuning for domain specialization, how the GRPO algorithm (the same one behind DeepSeek R1) trains a model to reason step by step, and how LoRA makes it possible to specialize a 26B-parameter model for under $50.
Whether you're an ML engineer, a researcher, or just curious about where open-source AI is headed, this is a practical, no-hype walkthrough of a pattern you can replicate in your own domain.
Bojan Jakimovski's blog → https://shekswess.github.io
Bojan's LinkedIn → https://linkedin.com/in/bojan-jakimovski
*** MODELS
Trinity-Mini-DrugProt-Think (LoRA adapter) → https://huggingface.co/lokahq/Trinity-Mini-DrugProt-Think
Arcee Trinity Mini (base model) → https://huggingface.co/arcee-ai/Trinity-Mini
Arcee Trinity Mini Base (pre-SFT) → https://huggingface.co/arcee-ai/Trinity-Mini-Base
Trinity Mini on OpenRouter (free tier) → https://openrouter.ai/arcee-ai/trinity-mini:free
Trinity Mini on OpenRouter (paid API) → https://openrouter.ai/arcee-ai/trinity-mini
*** CODE & CONFIGS
Full training repo (configs, metrics, deployment) → https://github.com/LokaHQ/Trinity-Mini-DrugProt-Think
12 experiment TOML configs → https://github.com/LokaHQ/Trinity-Mini-DrugProt-Think/tree/main/experiments/configs/rl
Training metrics CSVs → https://github.com/LokaHQ/Trinity-Mini-DrugProt-Think/tree/main/data
Deploying on Amazon SageMaker (Loka blog) → https://medium.com/loka-engineering/deploying-trinity-mini-drugprot-think-on-amazon-sagemaker-ai-9e1c1c430ce9
***DATASETS
DrugProt on Hugging Face (bigbio) → https://huggingface.co/datasets/bigbio/drugprot
DrugProt Parquet (OpenMed) → https://huggingface.co/datasets/OpenMed/drugprot-parquet
*** TOOLS & LIBRARIES
Hugging Face Transformers → https://github.com/huggingface/transformers
PEFT (LoRA & adapters) → https://github.com/huggingface/peft
TRL (GRPOTrainer) → https://github.com/huggingface/trl
Prime Intellect (hosted GRPO training) → https://primeintellect.ai
Prime Intellect Verifiers (RL environments) → https://github.com/PrimeIntellect-ai/verifiers