What is radiology image annotation?

Radiology image annotation is the expert labelling of medical imaging — CT, MRI, X-ray, ultrasound and mammography — so AI models can detect findings, classify pathology, segment anatomy and grade severity. The unit of work is the DICOM study, often a 3D volume with hundreds of slices, and the annotation has to clear a clinical-grade bar because the models trained on it support diagnostic decisions. Generalist annotation does not meet that bar.

What annotation tasks run on radiology datasets?

Six common ones: region-of-interest selection on 2D and 3D studies, organ and structure segmentation (per-slice or volumetric), lesion and finding detection (bounding boxes, polygons or segmentation depending on use), severity grading (BI-RADS for mammography, Lung-RADS for lung CT, RECIST for tumour response, etc), classification and triage labels for normal-vs-abnormal models, and report-level structured tagging that ties imaging findings to structured EHR fields.

Which vendors offer HIPAA-compliant medical AI training datasets?

A working answer rather than a brochure one. The vendor needs four things — secure infrastructure (encrypted at rest and in transit, VPC deployment available, access controls), a documented HIPAA-compliant workflow with BAA in place, radiologist-grade annotators or radiologist adjudication on every clinical-grade label, and per-record audit trails for the regulatory submission. At AI Taggers we deliver radiology annotation under all four; other vendors meet a subset and call it HIPAA-compliant. Ask for the BAA copy and the technical safeguards documentation up front.

Do you need certified radiologists for radiology annotation?

For ROI selection on clear cases and basic structure labelling, trained imaging annotators (radiology technologists or imaging-science backgrounds) working from radiologist-built reference sets can do high-quality work. For lesion detection, grading and any clinical-grade finding, you need board-certified radiologists — or trained annotators with radiologist adjudication on every borderline case. Skipping this is the single most common reason radiology AI datasets fail to clear regulator review.

What is DICOM and why does it matter for annotation?

DICOM is the medical imaging standard — every CT, MRI, X-ray and ultrasound from clinical equipment lands as DICOM files. Each file is a single image plus a structured header carrying modality, acquisition parameters, patient context (which must be de-identified before annotation), study and series IDs that link slices together. DICOM-native annotation tools preserve this structure; generic image labellers strip it, and you lose the per-slice volume context that volumetric segmentation depends on. Use DICOM-native tooling, no exceptions on clinical work.

How is radiology annotation quality measured?

Per-class accuracy against a consensus gold standard — typically two or three radiologists, blinded, with a defined resolution rule for disagreement. Dice coefficient and Hausdorff distance for segmentation tasks. Cohen's kappa or weighted kappa for grading tasks. Sensitivity and specificity at agreed operating points. Reported per modality and per finding class, every batch. Single-radiologist gold standards understate real-world disagreement and produce models that fail in clinical deployment.

What regulatory considerations apply to radiology AI training data?

If the AI is heading toward clinical deployment, the regulator wants paperwork. US FDA (510(k) or De Novo for diagnostic AI), Australian TGA, EU MDR/IVDR all require documented annotation provenance — annotator credentials, protocol versions with clinical sign-off, inter-annotator agreement per class, adjudication trail, per-study audit log. The annotation contract has to support this from day one; bolting documentation on retrospectively is genuinely painful and often impossible.

Radiology AI Annotation: DICOM, MRI, CT, X-Ray — HIPAA-Grade Training Data (2026)

Radiology AI is one of the most clinically established corners of medical AI. FDA-cleared products for chest X-ray triage, mammography screening, lung CT nodule detection, intracranial haemorrhage flagging on emergency head CT — they all exist, they all work, and they all live or die on training-data quality and provenance documentation. The annotation behind a radiology model isn't labelling. It's clinical work supported by labelling tools.

This guide is what we'd hand a team scoping their first radiology AI training-data contract — what the work actually involves, the modality-specific differences between MRI, CT, X-ray and ultrasound, who has to do the labelling, the regulatory paperwork the annotation has to support, and the realistic cost. Written from our experience running clinical-grade annotation workflows for diagnostic and research clients.

DICOM Isn't Just a File Format

Every CT, MRI, X-ray and ultrasound from clinical equipment lands as DICOM. Each file is a single image plus a structured header carrying modality, acquisition parameters, patient context, and the study/series IDs that link slices together into 3D volumes. The header is what makes radiology data clinical-grade.

Generic image labellers strip the DICOM header and treat each slice as a flat PNG. The result — you lose volumetric context, lose acquisition metadata that the model needs for normalisation, and lose the audit trail the regulator wants. DICOM-native tooling is non-negotiable on clinical projects. The tooling we use preserves the DICOM structure end-to-end and exports annotations linked to the original study and series IDs.

Modality-Specific Annotation Notes

X-ray — usually 2D, often paired projections (PA and lateral chest). Annotation is finding-level (bounding boxes or polygons around nodules, opacities, fractures) plus classification (normal vs abnormal, severity grading). Lung-RADS adjacent grading where the use case calls for it.
CT — volumetric. Per-slice or per-volume segmentation for organs (liver, spleen, kidneys), tumour-region masks for oncology AI, intracranial haemorrhage detection on head CT, pulmonary nodule detection on chest CT. The volume context matters; per-slice labelling without volume-level QA is the single most common quality failure.
MRI — multi-sequence (T1, T2, FLAIR, DWI, contrast-enhanced). Annotation has to be sequence-aware — a lesion may appear on FLAIR but not T1. Brain tumour segmentation, MS lesion tracking, knee and spine segmentation, prostate lesion grading (PI-RADS).
Ultrasound — operator-dependent acquisition, single-slice or sweep clips. Annotation is finding-level (boxes or polygons) plus classification, with strong inter-operator variability that QA has to acknowledge.
Mammography — BI-RADS classification, microcalcification clusters, mass margins. Sub-specialty radiologist oversight is essential here.
Nuclear medicine and PET — quantitative analysis, SUV thresholds, fused PET-CT annotation. Specialist sub-domain.

Who Actually Does the Annotation

Layered teams, not single annotator types. What works in practice:

Board-certified radiologists at the top of the loop. Build the spec, build the gold-standard reference set, adjudicate every borderline call, sign off on the QA process. Their time is the most expensive and where it counts most.
Trained imaging annotators in the middle. Radiology technologists, imaging-science graduates, or medically-trained reviewers who have passed project-specific calibration. They handle the bulk of the volume on clearly-defined tasks.
Specialist radiologists where the modality calls for it. Neuro-radiologists on brain MRI, breast radiologists on mammography, paediatric radiologists on paediatric work. Sub-specialty matters more than generalist seniority.
QA on top. Double-annotation, gold-set checks every batch, adjudication queue back to the radiologists. See our broader annotation QA playbook and our clinical expert annotation service.

The wrong structure — generalist annotators with no medical training, no radiologist adjudication, “HIPAA-compliant because we have a BAA template” — produces datasets that look fine on the delivery report and fail at the FDA submission. We've been brought in to fix several of these and it's genuinely expensive.

HIPAA-Grade Handling: The Bar That Isn't Optional

For US-bound work, HIPAA isn't a checkbox — it's the floor. What that actually means in practice:

De-identification before annotation. DICOM headers stripped of PHI per the Safe Harbor 18-identifier rule or expert-determination method.
Business Associate Agreement (BAA) in place between the AI developer and the annotation vendor. Real BAA, not a templated checkbox.
Encrypted in transit and at rest — TLS 1.2+ on the wire, AES-256 at rest, key management documented.
Access controls and audit logging — who annotated which study, when, from where, under which protocol version.
Secure infrastructure — VPC or private-cloud deployment available on request; some clients require on-prem.

Australian TGA-bound work and EU MDR/IVDR work have parallel requirements with their own specifics. The annotation contract has to support whichever regulatory regime your AI ships under — and if you might ship internationally, the strictest applicable regime usually wins. Build the documentation from day one; retrofitting it is painful and sometimes impossible.

Quality: Consensus Gold Standards, Per-Class Metrics

Radiologists genuinely disagree — inter-reader agreement on borderline lesions or BI-RADS calls is often weighted kappa around 0.6–0.8. That's not a failure; it's the nature of the task. A one-reader gold standard understates real-world variance and the model trained against it fails in clinical deployment. Consensus gold standards (two or three readers, blinded, defined resolution rule) are the bar. Reported metrics are concordance against consensus, per modality, per finding class, every batch. For grading tasks — weighted kappa. For segmentation — Dice and Hausdorff distance. For detection — sensitivity, specificity and AUC at agreed operating points. Per-class. Always per-class.

Scoping a radiology AI training-data project?

Send 5–10 representative studies from your hardest modality. We'll deliver radiologist-adjudicated annotation with consensus QA and a per-class concordance report in 72 hours. HIPAA-grade workflow, BAA on request.

See our radiology annotation service

What It Costs

Radiology annotation is genuinely one of the most expensive categories in commercial annotation. Board-certified radiologist time is the single biggest cost driver; volumetric work (CT and MRI 3D) is slower per study than 2D X-ray; consensus protocols multiply that by the size of the reader panel. Pricing is per study, per slice, per finding, or per organ depending on the task — and the only number that matters is the one a pilot on your studies produces. A flat per-study rate quoted sight-unseen is a guess.

Radiology AI Annotation: DICOM, MRI, CT, X-Ray — HIPAA-Grade Training Data