PPO

7 items

1 post, 6 guides

BlogApr 29, 2026

DeepSeek R1, PPO, and GRPO Explained for Devs

GRPO is suddenly the standard RL recipe for reasoning models. A no-prior-knowledge mental model of PPO, GRPO, and how DeepSeek R1's training works under the hood.

DeepSeek GRPO PPO RLHF Reinforcement Learning

GuideApr 23, 2026

Read Tool - Claude Code

Read file contents with line limiting, offset, and binary support.

GuideApr 23, 2026

Grep Tool - Claude Code

Search file contents by pattern with regex support.

GuideApr 23, 2026

gh CLI Integration - Claude Code

Full GitHub CLI support for automated PR and issue workflows.

GuideApr 23, 2026

CLAUDE.md Files - Claude Code

Persistent project instructions loaded every session; supports nested dirs.

GuideApr 23, 2026

1M Token Context - Claude Code

Extended context window for Opus and Sonnet on supported plans.

GuideApr 23, 2026

Skill Arguments - Claude Code

Pass arguments to skills with string substitution support.

Browse All Tags

PPO

Get Smarter About AI Dev

PPO

Get Smarter About AI Dev