DeepSeek R1 and V3: The Developer's Guide to Open-Source AI

TL;DR
DeepSeek's R1 and V3 models deliver frontier-level performance under an MIT license. Here's how to use them through the API, run them locally with Ollama, and decide when they beat closed-source alternatives.
Official Sources
| Resource | Link |
|---|---|
| DeepSeek Platform | platform.deepseek.com |
| DeepSeek API Docs | api-docs.deepseek.com |
| DeepSeek GitHub | github.com/deepseek-ai |
| DeepSeek R1 Paper | arxiv.org/abs/2501.12948 |
| DeepSeek V3 Technical Report | github.com/deepseek-ai/DeepSeek-V3 |
| Ollama DeepSeek Models | ollama.com/library/deepseek-r1 |
Why DeepSeek Matters
DeepSeek changed the economics of AI. When the Chinese research lab released R1 in January 2025, it demonstrated that a model trained for a fraction of the cost of GPT-4 could match or exceed it on reasoning benchmarks. The AI industry took notice. OpenAI reportedly accelerated their plans. Meta adjusted their roadmap. And developers gained access to genuinely frontier-class models under an MIT license.
For model-selection context, compare this with AI Design Slop: 15 Patterns That Out Your App as Vibe-Coded and Create Beautiful UI with Claude Code: The Style Guide Method; the useful question is not only benchmark quality, but where the model fits in a real developer workflow.
The story has two main characters: DeepSeek V3, a general-purpose model built for speed and breadth, and DeepSeek R1, a reasoning-focused model that thinks step by step before answering. Together, they cover most of what developers need from an LLM - and they do it at a price point that makes closed-source APIs look expensive.
The Two Models, Explained
DeepSeek V3 (General Purpose)
DeepSeek V3 is a mixture-of-experts (MoE) model with 671 billion total parameters and 37 billion active per forward pass. This architecture is the key to its efficiency: instead of running every token through the full parameter count, V3 routes each token to a subset of specialized expert networks. You get the knowledge of a massive model with the inference cost of a much smaller one.
V3 handles the tasks you throw at a general assistant: code generation, summarization, translation, analysis, and multi-turn conversation. It supports a 128K token context window, which is enough for most codebases and documents. The model was updated several times through 2025, with each revision closing gaps against GPT-4o and Claude Sonnet on standard benchmarks.
For day-to-day coding tasks - generating boilerplate, explaining code, writing tests, refactoring functions - V3 is the model to reach for. It responds fast and handles breadth well.
DeepSeek R1 (Reasoning)
R1 is the model that made headlines. Built on top of V3's architecture, R1 adds a chain-of-thought reasoning process that unfolds before the final answer. When you give R1 a math problem, a logic puzzle, or a complex debugging task, it works through the problem step by step in a visible thinking trace before producing its response.
The reasoning approach means R1 is slower than V3 - it generates more tokens per request because it thinks out loud. But for problems that require multi-step logic, the tradeoff is worth it. R1 matched OpenAI's o1 on math and coding benchmarks at launch, and subsequent updates have kept it competitive with o3 and Claude's extended thinking mode.
R1 shares the same 671B/37B MoE architecture as V3. The difference is in the training: R1 was fine-tuned with reinforcement learning that rewards correct reasoning chains, not just correct final answers. This produces a model that is better at catching its own mistakes and working through ambiguous problems.
Architecture: Why MoE Changes Everything
The mixture-of-experts design is central to understanding DeepSeek's cost advantage. Traditional dense models like Llama activate every parameter for every token. A 70B dense model uses 70 billion parameters per forward pass. DeepSeek V3 and R1 have 671 billion parameters total but only activate 37 billion per token - roughly the compute cost of a 37B dense model, with the knowledge capacity of something much larger.
This has direct consequences for developers:
- Inference is cheaper. Less compute per token means lower API prices and faster responses.
- Local deployment is feasible. The active parameter count determines memory requirements during inference. At 37B active parameters, quantized versions of DeepSeek models can run on consumer hardware.
- Quality scales with total parameters. The full 671B parameter set stores more knowledge and handles more domains than a 37B dense model ever could.
DeepSeek also pioneered multi-head latent attention (MLA), which compresses the key-value cache during inference. This reduces memory usage further and allows longer context windows without proportional memory growth. It is one of the reasons DeepSeek models punch above their weight on efficiency metrics.
Benchmarks: Where DeepSeek Stands in 2026
Benchmarks shift constantly, but DeepSeek's positioning has remained consistent: competitive with frontier closed-source models at a fraction of the cost.
R1 Reasoning Performance
| Benchmark | DeepSeek R1 | Claude Opus 4 | GPT-5 | Llama 4 Maverick |
|---|---|---|---|---|
| MATH-500 | 97.3 | 96.4 | 97.8 | 91.2 |
| AIME 2024 | 79.8 | 78.2 | 83.6 | 62.4 |
| GPQA Diamond | 71.5 | 72.8 | 75.1 | 61.3 |
| LiveCodeBench | 65.9 | 69.4 | 72.1 | 55.8 |
| SWE-bench Verified | 49.2 | 70.4 | 68.7 | 42.1 |
R1 leads on pure math and holds its own on science reasoning. It trails Claude and GPT-5 on agentic software engineering tasks (SWE-bench), where tool use and multi-turn planning matter more than raw reasoning. For single-turn problem solving, R1 remains one of the strongest options available.
V3 General Performance
| Benchmark | DeepSeek V3 | Claude Sonnet 4.6 | GPT-5 | Llama 4 Maverick |
|---|---|---|---|---|
| MMLU-Pro | 81.2 | 84.1 | 85.3 | 78.6 |
| HumanEval+ | 82.4 | 85.7 | 87.2 | 79.1 |
| MT-Bench | 9.1 | 9.3 | 9.4 | 8.8 |
V3 sits just below the top closed-source models on general benchmarks. The gap is real but narrow, and V3's speed and cost advantages often make it the practical choice for high-volume workloads.
Newsletter
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.
From the archive
How to Use DeepSeek
Option 1: DeepSeek API
The official API at api.deepseek.com is the simplest path. It follows the OpenAI API format, so any client library or tool that works with OpenAI's API works with DeepSeek by changing the base URL.
export OPENAI_API_KEY="your-deepseek-api-key"
export OPENAI_BASE_URL="https://api.deepseek.com"
From Python:
from openai import OpenAI
client = OpenAI(
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-reasoner", # R1
messages=[{"role": "user", "content": "Explain the CAP theorem with examples"}]
)
print(response.choices[0].message.content)
Switch deepseek-reasoner to deepseek-chat for V3. The API supports streaming, function calling, and JSON mode.
Option 2: Third-Party Providers
DeepSeek models are available on most major inference platforms:
- OpenRouter - aggregates multiple providers, automatic fallback
- Together AI - optimized inference for MoE models
- Fireworks AI - low-latency inference with competitive pricing
- Groq - hardware-accelerated inference for distilled R1 variants
Third-party providers often offer better availability than the official API, which has experienced capacity constraints during peak demand. OpenRouter is particularly useful because it routes to the fastest available provider automatically.
Option 3: Local Deployment with Ollama
Running DeepSeek locally eliminates API costs, removes rate limits, and keeps your data on your machine. Ollama makes this straightforward.
# Install Ollama (macOS)
brew install ollama
# Pull and run DeepSeek R1 distilled models
ollama pull deepseek-r1:8b # 4.9 GB - runs on most laptops
ollama pull deepseek-r1:14b # 9.0 GB - good balance
ollama pull deepseek-r1:32b # 20 GB - needs 32GB+ RAM
ollama pull deepseek-r1:70b # 43 GB - needs 64GB+ RAM
# Pull DeepSeek V3 (requires significant resources)
ollama pull deepseek-v3:671b # Full model - needs multi-GPU setup
# Run interactively
ollama run deepseek-r1:14b
The distilled R1 models deserve attention. DeepSeek distilled the reasoning capabilities of the full 671B R1 into smaller models based on Qwen and Llama architectures. The 14B distilled model outperforms many 70B general-purpose models on reasoning tasks while running comfortably on a MacBook Pro with 32GB of memory.
For API-style access to your local model:
# Ollama exposes an OpenAI-compatible API on port 11434
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1:14b",
"messages": [{"role": "user", "content": "Write a binary search in Rust"}]
}'
This means any tool that supports custom OpenAI-compatible endpoints works with your local DeepSeek instance. Point your editor, your scripts, or your agents at http://localhost:11434/v1 and go.
Hardware Requirements for Local Models
| Model | Parameters (Active) | Quantization | RAM Required | GPU VRAM |
|---|---|---|---|---|
| R1 8B distilled | 8B | Q4_K_M | 6 GB | 6 GB |
| R1 14B distilled | 14B | Q4_K_M | 10 GB | 10 GB |
| R1 32B distilled | 32B | Q4_K_M | 22 GB | 22 GB |
| R1 70B distilled | 70B | Q4_K_M | 44 GB | 44 GB |
| V3/R1 Full | 37B active | Q4_K_M | 300+ GB | Multi-GPU |
The sweet spot for most developers is the 14B or 32B distilled R1. These models offer strong reasoning performance at sizes that fit on consumer hardware. The full 671B model requires serious infrastructure - multiple A100s or equivalent - and is better accessed through an API.
Pricing: The Cost Advantage
DeepSeek's pricing is aggressively low compared to closed-source alternatives:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| DeepSeek V3 | $0.27 | $1.10 |
| DeepSeek R1 | $0.55 | $2.19 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| GPT-5 | $2.50 | $10.00 |
| Llama 4 (via Together) | $0.80 | $0.80 |
DeepSeek V3 is roughly 10x cheaper than Claude Sonnet on input tokens and over 13x cheaper on output. R1 is about 5x cheaper than Claude while delivering competitive reasoning performance. For high-volume applications - RAG pipelines, batch processing, CI/CD integrations - this pricing difference compounds fast.
The MIT license adds another dimension to the cost story. You can self-host DeepSeek models without licensing fees, fine-tune them for your domain, or embed them in commercial products. There are no usage restrictions, no phone-home requirements, and no vendor lock-in.
Best Use Cases for Developers
Where DeepSeek R1 Excels
- Math and algorithmic problems. R1's chain-of-thought reasoning handles complex mathematical derivations, optimization problems, and algorithmic design better than most alternatives at its price point.
- Code review and bug detection. The reasoning trace helps R1 walk through code systematically, catching logical errors that faster models skip over.
- Technical writing and documentation. R1 produces thorough, well-structured explanations. The reasoning process ensures it considers edge cases and prerequisites.
- Data analysis. When you need to reason about data patterns, anomalies, or statistical relationships, R1's step-by-step approach produces more reliable conclusions.
Where DeepSeek V3 Excels
- High-volume code generation. V3's speed and low cost make it ideal for generating boilerplate, tests, and utility functions at scale.
- Conversational AI. V3 is responsive and coherent in multi-turn conversations, making it suitable for chatbots and interactive applications.
- Translation and summarization. V3 handles multilingual tasks well, particularly with Chinese and English content.
- RAG pipelines. The combination of 128K context, fast inference, and low cost makes V3 an efficient choice for retrieval-augmented generation.
Where DeepSeek Falls Short
DeepSeek is not the best choice for everything. Be honest about the tradeoffs:
- Agentic coding. On SWE-bench and similar multi-turn tool-use benchmarks, Claude and GPT-5 maintain a meaningful lead. If you are building agents that need to plan, execute, and recover from errors across many steps, closed-source models still have the edge.
- Instruction following precision. Claude and GPT-5 are more reliable at following complex, multi-constraint prompts exactly as specified. DeepSeek models occasionally drift from instructions in long generations.
- Multimodal tasks. DeepSeek's vision capabilities exist but lag behind GPT-5 and Gemini for image understanding and generation tasks.
- Availability. The official DeepSeek API has experienced outages and rate limiting, particularly during high-demand periods. Third-party providers mitigate this, but it remains a consideration for production workloads.
When to Choose DeepSeek Over Closed-Source Models
The decision framework is straightforward:
Choose DeepSeek when:
- Cost is a primary concern and you are processing high token volumes
- You need to self-host for privacy, compliance, or latency reasons
- You want to fine-tune a model on your own data without licensing restrictions
- Your use case is primarily reasoning, math, or single-turn problem solving
- You are building a product and want to avoid vendor lock-in
Choose Claude or GPT-5 when:
- You need best-in-class agentic performance with tool use and multi-step planning
- Instruction following precision is critical to your workflow
- You need the strongest possible multimodal capabilities
- You are willing to pay for reliability guarantees and enterprise support
- Your use case involves complex system prompts with many constraints
The hybrid approach works best for most teams. Use DeepSeek for high-volume, cost-sensitive workloads and closed-source models for tasks where the quality gap justifies the price. Many developers run R1 locally for quick reasoning tasks and route complex agentic work to Claude. The OpenAI-compatible API format makes switching between providers trivial.
Getting Started Today
The fastest path from zero to running DeepSeek:
-
Try the API. Sign up at platform.deepseek.com, grab an API key, and point any OpenAI-compatible client at
api.deepseek.com. You will have working inference in under five minutes. -
Run locally. Install Ollama, pull
deepseek-r1:14b, and start experimenting. No API key needed, no usage limits, no data leaving your machine. -
Integrate with your tools. Any editor or CLI that supports custom OpenAI endpoints works with DeepSeek. Set the base URL and model name, and your existing workflows adapt without code changes.
-
Evaluate against your workload. Run your actual prompts against DeepSeek and your current model. Measure quality, latency, and cost across your real use cases - not synthetic benchmarks.
The open-source AI ecosystem has reached a point where frontier-level reasoning is accessible to any developer with a laptop and an internet connection. DeepSeek did not just contribute to that shift. It accelerated it.
Frequently Asked Questions
What is the difference between DeepSeek R1 and DeepSeek V3?
DeepSeek V3 is a general-purpose model optimized for speed and breadth - code generation, summarization, translation, and conversation. DeepSeek R1 is a reasoning-focused model that thinks step by step before answering. R1 is built on top of V3's architecture but was fine-tuned with reinforcement learning to produce visible chain-of-thought reasoning. Use V3 for fast, high-volume tasks. Use R1 when the problem requires multi-step logic, math, or careful reasoning.
Is DeepSeek really free to use?
DeepSeek models are released under the MIT license, which means you can download, modify, fine-tune, and deploy them commercially without licensing fees. The official DeepSeek API charges for usage (around $0.27-$0.55 per million input tokens), but you can self-host the models at no recurring cost beyond your infrastructure. Running smaller distilled variants locally with Ollama is completely free.
How do I run DeepSeek locally?
Install Ollama (brew install ollama on macOS), then pull a DeepSeek model with ollama pull deepseek-r1:14b. Run it interactively with ollama run deepseek-r1:14b. The 14B distilled R1 model requires about 10GB of RAM and runs well on most modern laptops. For API-style access, Ollama exposes an OpenAI-compatible endpoint at localhost:11434/v1.
What hardware do I need to run DeepSeek locally?
The distilled R1 models have different requirements: 8B needs 6GB RAM, 14B needs 10GB, 32B needs 22GB, and 70B needs 44GB. These are quantized (Q4_K_M) sizes. The full 671B model requires 300GB+ RAM across multiple GPUs and is impractical for most developers to self-host - use the API or a third-party provider instead. Most developers find the 14B or 32B distilled versions offer the best balance of quality and resource requirements.
How does DeepSeek compare to Claude and GPT-5 for coding?
DeepSeek R1 matches or exceeds Claude and GPT-5 on pure math and reasoning benchmarks like MATH-500 and AIME. However, it trails on agentic software engineering tasks (SWE-bench) where multi-step tool use and planning matter. For single-turn code generation and bug detection, R1 is competitive at a fraction of the cost. For complex agentic workflows with multiple tools and recovery from errors, Claude Opus and GPT-5 still lead.
Why is DeepSeek so much cheaper than OpenAI and Anthropic?
DeepSeek uses a mixture-of-experts (MoE) architecture that activates only 37 billion parameters per token despite having 671 billion total parameters. This dramatically reduces compute costs per inference. Combined with DeepSeek's position as a Chinese research lab with different cost structures and strategic priorities, they can price at 5-10x lower than Western competitors while maintaining quality.
Can I use DeepSeek with Claude Code, Cursor, or Aider?
Yes. DeepSeek models use the OpenAI API format, so any tool that supports custom OpenAI-compatible endpoints works with DeepSeek. Set your base URL to https://api.deepseek.com with your DeepSeek API key, or point to http://localhost:11434/v1 for a local Ollama instance. Aider supports DeepSeek directly. Claude Code and Cursor can use DeepSeek through their custom model configuration.
What are the main limitations of DeepSeek models?
DeepSeek has three notable limitations: First, agentic performance trails Claude and GPT-5 on multi-step tool-use tasks. Second, instruction following precision is lower - the models occasionally drift from complex, multi-constraint prompts. Third, the official API has experienced availability issues during peak demand. For production workloads, consider third-party providers like OpenRouter or Together AI for better reliability.
DeepSeek R1 and V3 are available under the MIT license. Visit github.com/deepseek-ai for model weights, documentation, and research papers.
Read next
Llama 4: The Complete Developer's Guide to Meta's Open Source Models
Meta's Llama 4 family brings mixture-of-experts to open source with Scout and Maverick. Here's how to run them locally, access them through APIs, and decide when they beat the competition.
10 min readQwen 3: Alibaba's Open-Source Model That Outclassed Llama 4
Alibaba released Qwen 3 with eight models under an Apache 2 license, including a 235B mixture-of-experts flagship that beats Llama 4 Maverick on nearly every benchmark while being smaller and cheaper to run.
8 min readClaude vs GPT for Coding: Which Model Writes Better TypeScript?
Claude Opus 4.7 vs GPT-5.5 for real TypeScript work. Benchmarks, pricing, model families, and practical differences.
5 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.








