
TL;DR
A developer fed 266MB of DICOM MRI data to Claude Code Opus for a second opinion on a shoulder diagnosis. The AI disagreed with the doctor. HN radiologists weighed in.
A developer named Antoine recently published an experiment that caught fire on Hacker News: he fed 266MB of DICOM MRI data from his right shoulder into Claude Code (Opus 4.8) to get a second opinion on his orthopedist's diagnosis.
The result? The AI disagreed with the human doctor. And the ensuing HN discussion - with actual radiologists weighing in - reveals a lot about where AI medical imaging stands today.
The setup was straightforward. Antoine had been dealing with right shoulder pain for two to three weeks. His doctor diagnosed a "Grade III (>50%-width) partial-thickness tear at the apical insertion" of the subscapularis tendon - a significant rotator cuff injury that typically leads to aggressive treatment.
Rather than accept this at face value, Antoine:
Claude developed a methodical analysis strategy, writing code to process the imaging data and examining it from multiple perspectives.
Here's where it gets interesting. The human radiologist saw a significant partial tear. Claude Code reported an "intact tendon" - essentially no tear at all.
When Antoine had Claude arbitrate between the two readings (providing both reports plus clinical test results), the AI concluded with "moderate-to-high confidence" that the evidence favored its own reading: "Mild insertional tendinosis; NO discrete partial- or full-thickness tear."
Antoine was left in diagnostic limbo. As he put it, the AI second opinion suggested the human-recommended treatment plan was "premature and more intervention-heavy than the facts seemed to justify." But he also acknowledged uncertainty about fully trusting AI for medical interpretation.
The thread exploded with 368+ comments, and the discussion divided into several camps.
Radiologists pushed back hard. One actual radiologist commented: "I can't really weigh in without seeing the full 3D MRI dataset." They pointed out a critical technical detail - ultrasound (which Antoine had also gotten) isn't great for detecting calcification and will miss small calcifications that would show on X-ray or MRI.
Multiple commenters noted that MRI is a 3D medium, and slicing it incorrectly can miss features entirely: "I would not be at all surprised if one could slice an MRI the wrong way to produce a 2D image that fails to show a feature that exists in the source data."
The "Claude is bad at images" camp appeared. Several commenters argued that Claude specifically underperforms on image understanding compared to other frontier models. One wrote: "Claude is the worst FM at image understanding. Prior to gpt-5.4 the only usable models were Gemini and Qwen."
Others countered that Claude handles some image types well, particularly PDF-to-markdown conversion and document understanding - but medical imaging is a different beast.
The sonography vs radiology distinction came up. A cardiac sonographer offered perspective: "Medical imaging is one of those things everyone thinks is simple because they don't know what they don't know. Any comment that doesn't start with 'I'm a radiologist' should be taken with a grain of salt."
The "AI second opinions help catch missed things" camp. Some shared stories of AI helping catch procedural errors or outdated treatment plans. One person described using AI-generated questions to push a GP who was mishandling their mother's care - and it worked.
The "this is a nightmare for doctors" camp. Multiple commenters argued that patients approaching doctors with AI-generated diagnoses creates friction: "Nightmare because users approach LLMs with the false confidence that they're always right, and present LLM outputs as fact to Doctors who have to waste time explaining that it's wrong most of the time."
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 28, 2026 • 8 min read
Jun 28, 2026 • 9 min read
Jun 27, 2026 • 9 min read
Jun 27, 2026 • 7 min read
Several important technical points emerged from the discussion:
MRI complexity matters. 2D MRI scans have gaps between slices (typically 10% of slice thickness). 3D scans don't have gaps but are slower and more prone to movement artifacts. The voxels in 3D scans might be 1mm x 1mm x 1mm - which sounds precise until you realize subtle tears can be smaller than that.
Prompting affects diagnosis. One researcher noted: "Subtle changes in prompts can cause different diagnosis." The exact wording you use when asking an AI about medical images meaningfully changes the output.
Modality matters. When a radiology report says something "isn't present," there's always an implicit caveat that the finding isn't present within the context of that specific imaging modality. An ultrasound saying "no calcifications" and an X-ray showing calcifications can both be correct - the ultrasound just can't see small ones.
This isn't really a story about whether you should trust AI for medical diagnosis (you shouldn't, not yet, not without human verification). It's a story about the current frontier of multimodal AI and where the edges are.
A few takeaways:
The capability gap is real but narrowing. Two years ago, asking any LLM to analyze raw DICOM files would have been absurd. Now Claude Code can install packages, write analysis code, and produce a structured medical reading. The reading might be wrong, but the workflow exists.
Domain expertise still matters. The radiologists in the thread could immediately identify limitations that a non-specialist wouldn't know to ask about - 2D vs 3D acquisition, slice gaps, modality-specific blind spots. AI doesn't yet surface these caveats reliably.
Second opinions have value, even imperfect ones. Antoine's doctor recommended shockwave therapy for a condition that recent clinical guidelines say doesn't respond to it (rotator cuff tendinopathy without calcification). Even if Claude's diagnosis is wrong, the friction of having a second opinion made Antoine dig deeper.
The probabilistic nature cuts both ways. As one commenter put it: "Not quite. An LLM generates text that would likely follow... A patient in pain with a bone protruding from their shin has a... 'broken leg.' The more training data, the more questions it can answer with a reasonable degree of probability of accuracy."
The counterpoint: "It can be helpful in your understanding the choices made by asking questions and thus in reassurance, but it requires something most people lack: understanding you are likely wrong since you are just collecting information without understanding it."
What's notable about this story isn't that Claude Code can read MRIs (it can, sort of). It's that the experiment is now cheap and accessible enough that a solo developer can run it on a weekend, publish results, and get hundreds of HN comments including feedback from actual radiologists.
That feedback loop - AI output, expert critique, public discussion - is how capabilities actually improve. The radiologist comments are training data for the next iteration of these models, whether directly or through the discourse they generate.
For now, the prudent approach is obvious: AI as a thinking aid, not a replacement for professional judgment. But the gap is closing faster than the medical establishment is adapting.
Antoine ended his post in diagnostic limbo, uncertain whether to trust the AI or the doctor. That uncertainty is probably the healthiest response right now.
Read next
A developer discovered that Claude Code's thinking output is summarized, not the raw reasoning. Here's what Anthropic's docs actually say - and why it matters.
5 min readSemgrep's security research team benchmarked LLMs on IDOR vulnerability detection. The open-weight GLM 5.2 beat Claude Code by 7 points at roughly one-sixth the cost.
6 min readClaude Code is Anthropic's AI coding agent for terminal, IDE, desktop, and browser workflows. Learn what it does, how it works, pricing, setup, MCP, skills, hooks, and subagents.
15 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolHigh-performance code editor built in Rust with native AI integration. Sub-millisecond input latency. Built-in assistant...
View ToolAnthropic's flagship reasoning model. Best-in-class for coding, long-context analysis, and agentic workflows. 1M token c...
View ToolMac app for running parallel Claude Code, Codex, and Cursor agents in isolated workspaces. Watch every agent work at onc...
View ToolUnlock pro skills and share private collections with your team.
View AppPro hooks for Claude Code. Private bundles, team sync, one-click install.
View AppEvery coding agent in one window. Stop alt-tabbing between Claude, Codex, and Cursor.
View AppConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsA complete, citation-backed Claude Code course with setup, prompting systems, MCP, CI, security, cost controls, and capstone workflows.
ai-developmentInstall Claude Code, configure your first project, and start shipping code with AI in under 5 minutes.
Getting Started
Open Design: Open-Source n8n App That Turns Any Website into a Brand Kit, Design System, HTML + Images The video introduces Open Design, an MIT-licensed full-stack template that combines AI and n8n a...

Nimbalyst Demo: A Visual Workspace for Codex + Claude Code with Kanban, Plans, and AI Commits Try it: https://nimbalyst.com/ Star Repo Here: https://github.com/Nimbalyst/nimbalyst This video demos N...

Composio: Connect AI Agents to 1,000+ Apps via CLI (Gmail, Google Docs/Sheets, Hacker News Workflows) Check out Composio here: http://dashboard.composio.dev/?utm_source=Youtube&utm_channel=0426&utm_...

Semgrep's security research team benchmarked LLMs on IDOR vulnerability detection. The open-weight GLM 5.2 beat Claude C...

Filippo Valsorda argues that LLMs have ended the era of treating security researchers with kid gloves. When anyone can d...

Baidu releases Unlimited OCR, an open-source vision-language model that parses 100+ page documents in a single pass with...

A developer discovered that Claude Code's thinking output is summarized, not the raw reasoning. Here's what Anthropic's...

The new wrangler deploy --temporary flag creates ephemeral Cloudflare accounts for AI agents. 60-minute deployments, no...

Modern LLMs now use MoE routing, mixed attention variants, and fused vision encoders. The simple transformer stack is go...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.