
TL;DR
DeepSeek's R1 and V3 models deliver frontier-level performance under an MIT license. Here's how to use them through the API, run them locally with Ollama, and decide when they beat closed-source alternatives.
DeepSeek changed the economics of AI. When the Chinese research lab released R1 in January 2025, it demonstrated that a model trained for a fraction of the cost of GPT-4 could match or exceed it on reasoning benchmarks. The AI industry took notice. OpenAI reportedly accelerated their plans. Meta adjusted their roadmap. And developers gained access to genuinely frontier-class models under an MIT license.
The story has two main characters: DeepSeek V3, a general-purpose model built for speed and breadth, and DeepSeek R1, a reasoning-focused model that thinks step by step before answering. Together, they cover most of what developers need from an LLM - and they do it at a price point that makes closed-source APIs look expensive.
DeepSeek V3 is a mixture-of-experts (MoE) model with 671 billion total parameters and 37 billion active per forward pass. This architecture is the key to its efficiency: instead of running every token through the full parameter count, V3 routes each token to a subset of specialized expert networks. You get the knowledge of a massive model with the inference cost of a much smaller one.
V3 handles the tasks you throw at a general assistant: code generation, summarization, translation, analysis, and multi-turn conversation. It supports a 128K token context window, which is enough for most codebases and documents. The model was updated several times through 2025, with each revision closing gaps against GPT-4o and Claude Sonnet on standard benchmarks.
For day-to-day coding tasks - generating boilerplate, explaining code, writing tests, refactoring functions - V3 is the model to reach for. It responds fast and handles breadth well.
R1 is the model that made headlines. Built on top of V3's architecture, R1 adds a chain-of-thought reasoning process that unfolds before the final answer. When you give R1 a math problem, a logic puzzle, or a complex debugging task, it works through the problem step by step in a visible thinking trace before producing its response.
The reasoning approach means R1 is slower than V3 - it generates more tokens per request because it thinks out loud. But for problems that require multi-step logic, the tradeoff is worth it. R1 matched OpenAI's o1 on math and coding benchmarks at launch, and subsequent updates have kept it competitive with o3 and Claude's extended thinking mode.
R1 shares the same 671B/37B MoE architecture as V3. The difference is in the training: R1 was fine-tuned with reinforcement learning that rewards correct reasoning chains, not just correct final answers. This produces a model that is better at catching its own mistakes and working through ambiguous problems.
The mixture-of-experts design is central to understanding DeepSeek's cost advantage. Traditional dense models like Llama activate every parameter for every token. A 70B dense model uses 70 billion parameters per forward pass. DeepSeek V3 and R1 have 671 billion parameters total but only activate 37 billion per token - roughly the compute cost of a 37B dense model, with the knowledge capacity of something much larger.
This has direct consequences for developers:
DeepSeek also pioneered multi-head latent attention (MLA), which compresses the key-value cache during inference. This reduces memory usage further and allows longer context windows without proportional memory growth. It is one of the reasons DeepSeek models punch above their weight on efficiency metrics.
Benchmarks shift constantly, but DeepSeek's positioning has remained consistent: competitive with frontier closed-source models at a fraction of the cost.
| Benchmark | DeepSeek R1 | Claude Opus 4 | GPT-5 | Llama 4 Maverick |
|---|---|---|---|---|
| MATH-500 | 97.3 | 96.4 | 97.8 | 91.2 |
| AIME 2024 | 79.8 | 78.2 | 83.6 | 62.4 |
| GPQA Diamond | 71.5 | 72.8 | 75.1 | 61.3 |
| LiveCodeBench | 65.9 | 69.4 | 72.1 | 55.8 |
| SWE-bench Verified | 49.2 | 70.4 | 68.7 | 42.1 |
R1 leads on pure math and holds its own on science reasoning. It trails Claude and GPT-5 on agentic software engineering tasks (SWE-bench), where tool use and multi-turn planning matter more than raw reasoning. For single-turn problem solving, R1 remains one of the strongest options available.
| Benchmark | DeepSeek V3 | Claude Sonnet 4.6 | GPT-5 | Llama 4 Maverick |
|---|---|---|---|---|
| MMLU-Pro | 81.2 | 84.1 | 85.3 | 78.6 |
| HumanEval+ | 82.4 | 85.7 | 87.2 | 79.1 |
| MT-Bench | 9.1 | 9.3 | 9.4 | 8.8 |
V3 sits just below the top closed-source models on general benchmarks. The gap is real but narrow, and V3's speed and cost advantages often make it the practical choice for high-volume workloads.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
The official API at api.deepseek.com is the simplest path. It follows the OpenAI API format, so any client library or tool that works with OpenAI's API works with DeepSeek by changing the base URL.
export OPENAI_API_KEY="your-deepseek-api-key"
export OPENAI_BASE_URL="https://api.deepseek.com"
From Python:
from openai import OpenAI
client = OpenAI(
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-reasoner", # R1
messages=[{"role": "user", "content": "Explain the CAP theorem with examples"}]
)
print(response.choices[0].message.content)
Switch deepseek-reasoner to deepseek-chat for V3. The API supports streaming, function calling, and JSON mode.
DeepSeek models are available on most major inference platforms:
Third-party providers often offer better availability than the official API, which has experienced capacity constraints during peak demand. OpenRouter is particularly useful because it routes to the fastest available provider automatically.
Running DeepSeek locally eliminates API costs, removes rate limits, and keeps your data on your machine. Ollama makes this straightforward.
# Install Ollama (macOS)
brew install ollama
# Pull and run DeepSeek R1 distilled models
ollama pull deepseek-r1:8b # 4.9 GB - runs on most laptops
ollama pull deepseek-r1:14b # 9.0 GB - good balance
ollama pull deepseek-r1:32b # 20 GB - needs 32GB+ RAM
ollama pull deepseek-r1:70b # 43 GB - needs 64GB+ RAM
# Pull DeepSeek V3 (requires significant resources)
ollama pull deepseek-v3:671b # Full model - needs multi-GPU setup
# Run interactively
ollama run deepseek-r1:14b
The distilled R1 models deserve attention. DeepSeek distilled the reasoning capabilities of the full 671B R1 into smaller models based on Qwen and Llama architectures. The 14B distilled model outperforms many 70B general-purpose models on reasoning tasks while running comfortably on a MacBook Pro with 32GB of memory.
For API-style access to your local model:
# Ollama exposes an OpenAI-compatible API on port 11434
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1:14b",
"messages": [{"role": "user", "content": "Write a binary search in Rust"}]
}'
This means any tool that supports custom OpenAI-compatible endpoints works with your local DeepSeek instance. Point your editor, your scripts, or your agents at http://localhost:11434/v1 and go.
| Model | Parameters (Active) | Quantization | RAM Required | GPU VRAM |
|---|---|---|---|---|
| R1 8B distilled | 8B | Q4_K_M | 6 GB | 6 GB |
| R1 14B distilled | 14B | Q4_K_M | 10 GB | 10 GB |
| R1 32B distilled | 32B | Q4_K_M | 22 GB | 22 GB |
| R1 70B distilled | 70B | Q4_K_M | 44 GB | 44 GB |
| V3/R1 Full | 37B active | Q4_K_M | 300+ GB | Multi-GPU |
The sweet spot for most developers is the 14B or 32B distilled R1. These models offer strong reasoning performance at sizes that fit on consumer hardware. The full 671B model requires serious infrastructure - multiple A100s or equivalent - and is better accessed through an API.
DeepSeek's pricing is aggressively low compared to closed-source alternatives:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| DeepSeek V3 | $0.27 | $1.10 |
| DeepSeek R1 | $0.55 | $2.19 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| GPT-5 | $2.50 | $10.00 |
| Llama 4 (via Together) | $0.80 | $0.80 |
DeepSeek V3 is roughly 10x cheaper than Claude Sonnet on input tokens and over 13x cheaper on output. R1 is about 5x cheaper than Claude while delivering competitive reasoning performance. For high-volume applications - RAG pipelines, batch processing, CI/CD integrations - this pricing difference compounds fast.
The MIT license adds another dimension to the cost story. You can self-host DeepSeek models without licensing fees, fine-tune them for your domain, or embed them in commercial products. There are no usage restrictions, no phone-home requirements, and no vendor lock-in.
DeepSeek is not the best choice for everything. Be honest about the tradeoffs:
The decision framework is straightforward:
Choose DeepSeek when:
Choose Claude or GPT-5 when:
The hybrid approach works best for most teams. Use DeepSeek for high-volume, cost-sensitive workloads and closed-source models for tasks where the quality gap justifies the price. Many developers run R1 locally for quick reasoning tasks and route complex agentic work to Claude. The OpenAI-compatible API format makes switching between providers trivial.
The fastest path from zero to running DeepSeek:
Try the API. Sign up at platform.deepseek.com, grab an API key, and point any OpenAI-compatible client at api.deepseek.com. You will have working inference in under five minutes.
Run locally. Install Ollama, pull deepseek-r1:14b, and start experimenting. No API key needed, no usage limits, no data leaving your machine.
Integrate with your tools. Any editor or CLI that supports custom OpenAI endpoints works with DeepSeek. Set the base URL and model name, and your existing workflows adapt without code changes.
Evaluate against your workload. Run your actual prompts against DeepSeek and your current model. Measure quality, latency, and cost across your real use cases - not synthetic benchmarks.
The open-source AI ecosystem has reached a point where frontier-level reasoning is accessible to any developer with a laptop and an internet connection. DeepSeek did not just contribute to that shift. It accelerated it.
DeepSeek R1 and V3 are available under the MIT license. Visit github.com/deepseek-ai for model weights, documentation, and research papers.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Configure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.

Try Chat2DB for free: https://bit.ly/4hEyHq4 #AI #SQL #DataScience #opensource Chat2DB AI: Latest Features Unveiled Hey everyone! In this video, I'll walk you through the latest updates...

Learn The Fundamentals Of Becoming An AI Engineer On Scrimba; https://dub.sh/dd-scrimba In this video, I dive into the release of DeepSeek R1, the latest open-source reasoning model by DeepSeek....

Learn The Fundamentals Of Becoming An AI Engineer On Scrimba; https://scrimba.com/the-ai-engineer-path-c02v?via=developersdigest DeepSeek V3: The New Game-Changer in AI Models! In this video,...

Meta's Llama 4 family brings mixture-of-experts to open source with Scout and Maverick. Here's how to run them locally,...

NVIDIA's Nemotron 3 Super combines latent mixture of experts with hybrid Mamba architecture - 120B total parameters, 12B...

NVIDIA's Nemotron Nano 9B V2 delivers something rare: a small language model that doesn't trade capability for speed. Th...