Model Training

1 article12 min total read time

1 of 1 posts published

DeepSeek R1, PPO, and GRPO Explained for Devs

GRPO is suddenly the standard RL recipe for reasoning models. A no-prior-knowledge mental model of PPO, GRPO, and how DeepSeek R1's training works under the hood.

12 min read|Read →

Model Training

DeepSeek R1, PPO, and GRPO Explained for Devs

Get Smarter About AI Dev

Model Training

DeepSeek R1, PPO, and GRPO Explained for Devs

Get Smarter About AI Dev