Running AI models locally - Ollama, open-source models, and on-device inference.
14 resources - 6 posts, 8 tools

Cline is a free, open-source VS Code extension that brings autonomous AI coding to your editor. It works with local models or cloud APIs, handles multi-file changes, and runs terminal commands without proprietary lock-in.

A Show HN PDF form demo points at a bigger architecture shift: keep sensitive documents local, expose narrow browser tools to the model, and make AI assistance inspectable.

DeepSeek's R1 and V3 models deliver frontier-level performance under an MIT license. Here's how to use them through the API, run them locally with Ollama, and decide when they beat closed-source alternatives.

Meta's Llama 4 family brings mixture-of-experts to open source with Scout and Maverick. Here's how to run them locally, access them through APIs, and decide when they beat the competition.

NVIDIA's Nemotron Nano 9B V2 delivers something rare: a small language model that doesn't trade capability for speed. This 9B parameter model outperforms Qwen 3B across instruction following, math,...

Microsoft's PHI-4 is an MIT-licensed 14 billion parameter model that matches Llama 3.3 70B and Qwen 2.5 72B on key benchmarks. Here is what makes it special, how to run it locally, and why small language models are increasingly practical for real development work.
The easiest way to run LLMs locally. One command to pull and run any model. OpenAI-compatible API. 52M+ monthly downloads. Supports GGUF, Safetensors, and custom Modelfiles.
Local AIDesktop app for discovering, downloading, and running local LLMs. Clean chat UI, OpenAI-compatible API server, and automatic GPU detection. MLX engine optimized for Apple Silicon.
Local AIOpen-source ChatGPT alternative that runs 100% offline. Desktop app with local models, cloud API connections, custom assistants, and MCP integration. AGPLv3 licensed.
Local AIPrivate local AI chatbot by Nomic. 250K+ monthly users, 65K GitHub stars. LocalDocs feature lets you chat with your own files. Runs on Windows, macOS, and Linux.
Local AIOpen-source OpenAI API replacement. Runs LLMs, vision, voice, image, and video models on any hardware - no GPU required. 35+ backends. Distributed mode for scaling.
Local AIC++ inference engine for LLMs. GGUF format, quantization, CPU and Metal/CUDA support. The foundation most local tools build on.
Local AIHigh-throughput inference server for LLMs. PagedAttention memory management. The go-to for serious local or self-hosted serving.
Local AIApple's array framework for machine learning on Apple Silicon. Native Metal support, unified memory, first-class LLM inference.
Local AI
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Explore 351 topics
Browse All Topics