All blog posts, tools, and guides about Quantization from Developers Digest.
2 resources - 1 post, 1 tool
Unsloth's dynamic quantization makes GLM-5.2 runnable on a 256GB Mac or a 24GB GPU with CPU offloading. Here is the hardware math, the quantization tradeoffs, and what the HN community learned from actually running it.
C++ inference engine for LLMs. GGUF format, quantization, CPU and Metal/CUDA support. The foundation most local tools build on.
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Explore 591 topics