Homunculus 12B and GLM-4-32B-Base-32K: 2 new Arcee AI research-oriented models

Jul 3, 2025Channel
AI Analysis
Data from YouTube Data API v3Updated Just now
Julien Simon
Julien Simon

503K subscribers

View Channel

Video Overview

Video Details

Published12 months ago
Duration9:04
Video IDmhrcPviW-MU
Languageen
CategoryScience & Technology
PrivacyPublic
Made for KidsNo
Video TypeRegular Video

Performance Metrics

Views22.4K
Likes370
Comments2
Engagement Rate1.66%
Likes per 100 views1.65
Comments per 1K views0.09

Description

In this video, I introduce two new research-oriented models that Arcee AI recently released on Hugging Face. Homunculus is a 12 billion-parameter instruction model distilled from Qwen3-235B onto the Mistral-Nemo backbone. It was purpose-built to preserve Qwen’s two-mode interaction style—/think (deliberate chain-of-thought) and /nothink (concise answers)—while running on a single consumer GPU, and even on CPU as demonstrated in the video. GLM-4-32B-Base-32K is an enhanced version of THUDM's GLM-4-32B-Base-0414, specifically engineered to offer robust performance over an extended context window. While the original model's capabilities degraded after 8,192 tokens, this version maintains strong performance up to a 32,000-token context, making it ideal for tasks requiring long-context understanding and processing. ⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. You can also follow me on Medium at https://julsimon.medium.com or Substack at https://julsimon.substack.com. ⭐️⭐️⭐️ ** Homunculus - https://huggingface.co/arcee-ai/Homunculus - https://huggingface.co/arcee-ai/Homunculus-GGUF bin/llama-cli -m ~/models/homunculus/Homunculus-Q4_K_M.gguf --color -c 65535 "Looking at multi-head attention, group-query attention, multi-query attention, and multi-head latent attention, which method would optimize inference latency for a small language model with 32 attention layers running on a 64-core Intel CPU?" ** GLM-4-32B-Base-32K - https://huggingface.co/arcee-ai/GLM-4-32B-Base-32K - https://huggingface.co/bartowski/arcee-ai_GLM-4-32B-Base-32K-GGUF - https://www.arcee.ai/blog/extending-afm-4-5b-to-64k-context-length ⭐️⭐️⭐️ While you're here, I’ve got a great deal for you! If you care about your online security, you need Proton Pass — the ultra-secure password manager from the creators of Proton Mail. GET 60% OFF at https://go.getproton.me/aff_c?offer_id=42&aff_id=13055&url_id=994 ⭐️⭐️⭐️

Related Videos

More videos from Julien Simon