5 items
5 tools
High-throughput inference server for LLMs. PagedAttention memory management. The go-to for serious local or self-hosted serving.
The easiest way to run LLMs locally. One command to pull and run any model. OpenAI-compatible API. 52M+ monthly downloads. Supports GGUF, Safetensors, and custom Modelfiles.
Open-source OpenAI API replacement. Runs LLMs, vision, voice, image, and video models on any hardware - no GPU required. 35+ backends. Distributed mode for scaling.
Meta's open-source model family. Llama 4 available in Scout (17B active) and Maverick (17B active, 128 experts). Free to use, modify, and deploy commercially.
Self-hosted PaaS for deploying apps, databases, and services. Git-based deploys, Docker support, preview environments, and a clean UI.

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.