LLM / Agent / Code
AutoTrainess: Teaching Language Models to Improve Language Models Autonomously
** Zhaojian Yu, Penghao Yin, Shuzheng Gao, Shilin He, Kai Cai, Xiao-Ping Zhang
AutoTrainess: Teaching Language Models to Improve Language Models Autonomously
Authors: Zhaojian Yu, Penghao Yin, Shuzheng Gao, Shilin He, Kai Cai, Xiao-Ping Zhang
arXiv ID: 2606.31551
Problem: Training language models remains a human-intensive process because autonomous post-training requires an LM agent to plan iterations, construct benchmark-aligned data, run stable training jobs, evaluate checkpoints, and preserve experiment state - a long-horizon task that underspecified CLI environments fail to support.
Key Methodology:
- Introduces AutoTrainess, an LM agent that exposes training operations as a structured repository of agent-computer interfaces (planning, data prep, training, evaluation, logging) rather than leaving the agent in a raw CLI environment.
- Externalizes prior human experience as explicit workflows, rules, and execution constraints that guide the agent toward reliable training behavior.
Key Results:
- On PostTrainBench, AutoTrainess achieves 26.94 average score with GPT-5.4 (Codex) vs. 23.21 for CLI-only baselines.
- Improves DeepSeek-V4-Flash (OpenCode) from 12.13 to 19.58, demonstrating generalization across models and harnesses.
Applied Context: For builders, AutoTrainess shows that wrapping LM training infrastructure in structured agent-computer interfaces (rather than raw CLIs) unlocks autonomous self-improvement loops - meaning the next generation of coding agents may increasingly train and fine-tune themselves without human-in-the-loop.