Ollama
Single-binary local model runner with a simple CLI and HTTP API.
Pros
- +Easiest local setup
- +Huge model library
- +Cross-platform
Cons
- -Single-node only
- -Less tunable than vLLM
Local LLM runtimes let you run open-weight models on your own hardware for privacy, cost, and offline work. Trade-offs are between ergonomics, throughput, and how much GPU memory you actually have.
Single-binary local model runner with a simple CLI and HTTP API.
Desktop GUI for downloading, chatting with, and serving local models.
The C++ runtime that powers most local model setups, with GGUF support.
High-throughput inference server with PagedAttention for production.
Apple's array framework with first-class LLM examples for Apple Silicon.
Concepts you will run into when working with local llm runtimes.

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.