Medical AI May 2026 14 min read

Radiology AI Annotation: DICOM, MRI, CT, X-Ray — HIPAA-Grade Training Data

Radiology AI training data has the highest stakes and the highest documentation bar of anything we annotate. A misread X-ray that propagates through a model becomes a misread X-ray at scale. This is the practical guide to radiology annotation that actually clears regulator review — DICOM workflows, modality specifics, certified-radiologist oversight, and the HIPAA-grade handling that's non-negotiable.

Radiology AI is one of the most clinically established corners of medical AI. FDA-cleared products for chest X-ray triage, mammography screening, lung CT nodule detection, intracranial haemorrhage flagging on emergency head CT — they all exist, they all work, and they all live or die on training-data quality and provenance documentation. The annotation behind a radiology model isn't labelling. It's clinical work supported by labelling tools.

This guide is what we'd hand a team scoping their first radiology AI training-data contract — what the work actually involves, the modality-specific differences between MRI, CT, X-ray and ultrasound, who has to do the labelling, the regulatory paperwork the annotation has to support, and the realistic cost. Written from our experience running clinical-grade annotation workflows for diagnostic and research clients.

DICOM Isn't Just a File Format

Every CT, MRI, X-ray and ultrasound from clinical equipment lands as DICOM. Each file is a single image plus a structured header carrying modality, acquisition parameters, patient context, and the study/series IDs that link slices together into 3D volumes. The header is what makes radiology data clinical-grade.

Generic image labellers strip the DICOM header and treat each slice as a flat PNG. The result — you lose volumetric context, lose acquisition metadata that the model needs for normalisation, and lose the audit trail the regulator wants. DICOM-native tooling is non-negotiable on clinical projects. The tooling we use preserves the DICOM structure end-to-end and exports annotations linked to the original study and series IDs.

Modality-Specific Annotation Notes

Who Actually Does the Annotation

Layered teams, not single annotator types. What works in practice:

The wrong structure — generalist annotators with no medical training, no radiologist adjudication, “HIPAA-compliant because we have a BAA template” — produces datasets that look fine on the delivery report and fail at the FDA submission. We've been brought in to fix several of these and it's genuinely expensive.

HIPAA-Grade Handling: The Bar That Isn't Optional

For US-bound work, HIPAA isn't a checkbox — it's the floor. What that actually means in practice:

Australian TGA-bound work and EU MDR/IVDR work have parallel requirements with their own specifics. The annotation contract has to support whichever regulatory regime your AI ships under — and if you might ship internationally, the strictest applicable regime usually wins. Build the documentation from day one; retrofitting it is painful and sometimes impossible.

Quality: Consensus Gold Standards, Per-Class Metrics

Radiologists genuinely disagree — inter-reader agreement on borderline lesions or BI-RADS calls is often weighted kappa around 0.6–0.8. That's not a failure; it's the nature of the task. A one-reader gold standard understates real-world variance and the model trained against it fails in clinical deployment. Consensus gold standards (two or three readers, blinded, defined resolution rule) are the bar. Reported metrics are concordance against consensus, per modality, per finding class, every batch. For grading tasks — weighted kappa. For segmentation — Dice and Hausdorff distance. For detection — sensitivity, specificity and AUC at agreed operating points. Per-class. Always per-class.

Scoping a radiology AI training-data project?

Send 5–10 representative studies from your hardest modality. We'll deliver radiologist-adjudicated annotation with consensus QA and a per-class concordance report in 72 hours. HIPAA-grade workflow, BAA on request.

See our radiology annotation service

What It Costs

Radiology annotation is genuinely one of the most expensive categories in commercial annotation. Board-certified radiologist time is the single biggest cost driver; volumetric work (CT and MRI 3D) is slower per study than 2D X-ray; consensus protocols multiply that by the size of the reader panel. Pricing is per study, per slice, per finding, or per organ depending on the task — and the only number that matters is the one a pilot on your studies produces. A flat per-study rate quoted sight-unseen is a guess.

Related Reading

Free Sample · 24-48 hours

Get a radiology pilot in 72 hours

Send 5-10 representative DICOM studies. We'll deliver radiologist-adjudicated labels with consensus QA and a per-class concordance report. HIPAA-grade, BAA available.

No commitment. NDA available on request. We respond within 24 hours, often the same day for Gulf-region inquiries.

Neel Bennett

AI Annotation Specialist at AI Taggers

Neel has over 8 years of experience in AI training data and machine learning operations. He specializes in helping enterprises build high-quality datasets for computer vision and NLP applications across healthcare, automotive, and retail industries.

Connect on LinkedIn