Discrete Diffusion Language Models for Interactive Radiology Report Drafting

Authors: Max Van Puyvelde, Halil Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert

arXiv ID: 2607.01436

Problem: Can a discrete diffusion language model match or exceed autoregressive models on medical report generation while offering capabilities autoregressive models lack?

Key Methodology:

Adapted DiffusionGemma-26B (mixture-of-experts diffusion LM) and benchmarked against Gemma-4-26B under an identical LoRA recipe on medical VQA datasets
Scored outputs using a verbosity-robust LLM judge to avoid length-biased evaluation
Leveraged diffusion's inherent any-order infill - a radiologist can fix report fragments and have the model fill text between them, which AR models handle poorly

Key Results:

Diffusion matches or exceeds AR on all medical VQA benchmarks
The finetuned model (3.8B active parameters) is competitive with frontier vision-language models
Decoding is 3.5–4.4× faster than the AR counterpart

Applied Context: Diffusion LMs give builders a strict speed advantage over AR models and unlock interactive infill workflows - ideal for clinical drafting where users iteratively refine structured text. The any-order generation paradigm also points toward new UI patterns for document editing beyond radiology.