MOE

4 items

2 posts, 2 tools

ToolJun 11, 2026

DeepSeek's open-weights frontier family, previewed April 24, 2026. V4-Pro is 1.6T total / 49B active params; V4-Flash is 284B / 13B. 1M context standard. Weights on Hugging Face.

ai model open-source deepseek moe 1m-context cost-effective

BlogApr 29, 2026

NVIDIA Nemotron 3 Super: A Developer's Guide to the 120B Hybrid MoE

A practical walkthrough of Nemotron 3 Super: latent mixture of experts, hybrid Mamba transformer architecture, 1M context, reasoning modes, and the code you actually need to run it on NVIDIA hardware.

NVIDIA Nemotron MoE Mamba Open Source AI Models Triton Transformers

ToolApr 23, 2026

Qwen3-Coder

Alibaba's flagship open-weight coding model. 480B total parameters, 35B active (MoE). Native 256K context, scales to 1M. Apache 2.0 license. State-of-the-art agentic coding.

ai model open-source alibaba qwen coding moe apache-2

BlogMar 13, 2026

NVIDIA's Nemotron 3 Super in 6 Minutes

NVIDIA's Nemotron 3 Super combines latent mixture of experts with hybrid Mamba architecture - 120B total parameters, 12B active per token, 1M context, and up to 4x more experts at the same cost.

NVIDIA Nemotron MoE Mamba Open Source AI Models

Get Smarter About AI Dev

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.

One email per weekReal code, not theoryFree forever

Browse All Tags