LLM / Multimodal / Diffusion
Discrete Diffusion Language Models for Interactive Radiology Report Drafting
** Max Van Puyvelde, Halil Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert
Discrete Diffusion Language Models for Interactive Radiology Report Drafting
Authors: Max Van Puyvelde, Halil Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert
arXiv ID: 2607.01436
Problem: Can a discrete diffusion language model match or exceed autoregressive models on medical report generation while offering capabilities autoregressive models lack?
Key Methodology:
- Adapted DiffusionGemma-26B (mixture-of-experts diffusion LM) and benchmarked against Gemma-4-26B under an identical LoRA recipe on medical VQA datasets
- Scored outputs using a verbosity-robust LLM judge to avoid length-biased evaluation
- Leveraged diffusion's inherent any-order infill - a radiologist can fix report fragments and have the model fill text between them, which AR models handle poorly
Key Results:
- Diffusion matches or exceeds AR on all medical VQA benchmarks
- The finetuned model (3.8B active parameters) is competitive with frontier vision-language models
- Decoding is 3.5–4.4× faster than the AR counterpart
Applied Context: Diffusion LMs give builders a strict speed advantage over AR models and unlock interactive infill workflows - ideal for clinical drafting where users iteratively refine structured text. The any-order generation paradigm also points toward new UI patterns for document editing beyond radiology.