Homunculus 12B and GLM-4-32B-Base-32K: 2 new Arcee AI research-oriented models
Jul 3, 2025•Channel
AI Analysis
Data from YouTube Data API v3•Updated Just now
Video Overview
Video Details
Published12 months ago
Duration9:04
Video IDmhrcPviW-MU
Languageen
CategoryScience & Technology
PrivacyPublic
Made for KidsNo
Video TypeRegular Video
Performance Metrics
Views22.4K
Likes370
Comments2
Engagement Rate1.66%
Likes per 100 views1.65
Comments per 1K views0.09
Video Tags
Description
In this video, I introduce two new research-oriented models that Arcee AI recently released on Hugging Face.
Homunculus is a 12 billion-parameter instruction model distilled from Qwen3-235B onto the Mistral-Nemo backbone. It was purpose-built to preserve Qwen’s two-mode interaction style—/think (deliberate chain-of-thought) and /nothink (concise answers)—while running on a single consumer GPU, and even on CPU as demonstrated in the video.
GLM-4-32B-Base-32K is an enhanced version of THUDM's GLM-4-32B-Base-0414, specifically engineered to offer robust performance over an extended context window. While the original model's capabilities degraded after 8,192 tokens, this version maintains strong performance up to a 32,000-token context, making it ideal for tasks requiring long-context understanding and processing.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. You can also follow me on Medium at https://julsimon.medium.com or Substack at https://julsimon.substack.com. ⭐️⭐️⭐️
** Homunculus
- https://huggingface.co/arcee-ai/Homunculus
- https://huggingface.co/arcee-ai/Homunculus-GGUF
bin/llama-cli -m ~/models/homunculus/Homunculus-Q4_K_M.gguf --color -c 65535
"Looking at multi-head attention, group-query attention, multi-query attention, and multi-head latent attention, which method would optimize inference latency for a small language model with 32 attention layers running on a 64-core Intel CPU?"
** GLM-4-32B-Base-32K
- https://huggingface.co/arcee-ai/GLM-4-32B-Base-32K
- https://huggingface.co/bartowski/arcee-ai_GLM-4-32B-Base-32K-GGUF
- https://www.arcee.ai/blog/extending-afm-4-5b-to-64k-context-length
⭐️⭐️⭐️ While you're here, I’ve got a great deal for you! If you care about your online security, you need Proton Pass — the ultra-secure password manager from the creators of Proton Mail. GET 60% OFF at https://go.getproton.me/aff_c?offer_id=42&aff_id=13055&url_id=994 ⭐️⭐️⭐️