Inference
A binary file format for storing quantized language models, designed for efficient local inference with llama.cpp and tools built on it.
A binary file format for storing quantized language models, designed for efficient local inference with llama.cpp and tools built on it. GGUF files contain model weights, tokenizer data, and metadata in a single file. Both Ollama and LM Studio use GGUF as their primary model format. When you download a model from Hugging Face for local use, you are typically downloading a GGUF file.
In practice, developers reach for GGUF when they need the capability described above as part of an AI feature or workflow.
Hands-on guides, comparisons, and tutorials that cover Inference.
A binary file format for storing quantized language models, designed for efficient local inference with llama.cpp and tools built on it.
GGUF sits in the Inference part of the AI stack. Understanding it helps you make better decisions when building, debugging, and shipping AI features.
Developers Digest publishes tutorials and videos that cover Inference topics including GGUF. Check the blog and YouTube channel for hands-on walkthroughs.
The numerical parameters inside a neural network that are learned during training.
AI models released with publicly available weights that anyone can download, run, fine-tune, and deploy.
A training technique where a smaller "student" model learns to replicate the behavior of a larger "teacher" model.

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.