How Is CT Scan Annotation Done for Radiology AI?

Quick answer

CT scan annotation is the process of labelling computed tomography volumes for AI model training — drawing bounding boxes or segmentation masks on DICOM slices, applying Hounsfield windowing to reveal pathology, maintaining consistency across the full slice stack, and recording structured finding metadata. Production workflows require radiologist involvement at the QA stage at minimum; complex pathology characterisation tasks require radiologists as primary annotators under FDA and TGA regulatory frameworks.

What Makes CT Annotation Fundamentally Different From Other Medical Imaging

A CT scan is not a single image — it is a three-dimensional volume reconstructed from hundreds or thousands of individual axial slices, each 0.5–5 mm thick. A chest CT typically contains 300–600 slices. An annotator who labels only the slice where a nodule appears largest has not annotated a CT scan; they have annotated one frame of it. Production CT annotation requires either 3D volumetric segmentation tools that operate across the full slice stack, or systematic multi-slice labelling with explicit cross-referencing between adjacent slices to maintain spatial continuity.

The second distinguishing feature is Hounsfield unit windowing. CT pixel values encode radiodensity on the Hounsfield scale, and a single CT volume contains tissue types spanning roughly 3,000 HU — from air at -1000 HU through soft tissue, blood, muscle, calcification, and cortical bone. No single display window makes all pathologies visible simultaneously. An annotator working on pulmonary nodule detection must apply the lung window (-600 to +1600 HU). Switching to the mediastinal window (-160 to +240 HU) for the same dataset will hide small nodules entirely. Annotation performed in the wrong window systematically misses findings and produces corrupted training labels.

Our CT scan annotation service covers thoracic (chest CT, pulmonary nodule, lung cancer screening), abdominal (liver, kidney, pancreas, colon), musculoskeletal (fracture, joint pathology), and neuroradiology (brain lesion, stroke, haemorrhage) modalities. Each task type has its own windowing protocol, annotator qualification standard, and QA framework.

The CT Annotation Workflow: Four Stages That Determine Data Quality

A well-designed CT annotation workflow is sequential and non-negotiable. Compressing or skipping stages to meet volume targets is the most common root cause of CT training datasets that fail to improve model performance.

Stage 1: DICOM ingestion and pre-processing

Source CT volumes arrive in DICOM format. Pre-processing validates DICOM headers (slice thickness, pixel spacing, patient orientation, reconstruction kernel), reconstructs 3D volumes, and applies institution-specific anonymisation before annotation begins. Corrupt or non-standard DICOMs — particularly from older scanners — must be identified and handled before annotators encounter them; DICOM errors discovered mid-annotation propagate as systematic label errors across dozens of affected volumes.

Stage 2: Windowed annotation with multi-slice review

Annotators work in a DICOM viewer (RadiAnt, 3D Slicer, OHIFv3, Mint Medical) with pre-configured window/level presets for each task type. For each target finding, they annotate on the most diagnostic slice, then verify presence and extend annotations across adjacent slices using the scrolling view. 3D segmentation tools (ITK-SNAP, 3D Slicer with the Segment Editor extension) propagate initial contours to adjacent slices via semi-automatic algorithms, reducing manual effort by 40–60% on structures with smooth boundaries.

Stage 3: Radiologist QA review and adjudication

Every CT annotation requires at least a radiologist QA review pass. The reviewing radiologist checks for missed findings (false negatives), incorrect classifications (wrong pathology label), boundary errors in segmentations, and windowing protocol violations. For contested findings — nodules at the classification borderline, or lesions with ambiguous characteristics — a senior radiologist adjudication session resolves disagreement and documents the reasoning. Adjudication records are part of the annotation provenance log required for FDA submissions.

Stage 4: Inter-rater agreement measurement

A random 10–15% sample of annotated volumes is independently re-annotated by a second radiologist. Cohen's kappa or Dice similarity coefficient (for segmentation) is computed per finding category. Kappa below 0.6 on a target finding class indicates the annotation guidelines are insufficiently specific and require revision before the remaining volume is annotated. This feedback loop prevents systematic label noise from scaling to the full dataset.

Multi-Slice Consistency: The Hardest Part of CT Annotation to Get Right

The most common quality failure in CT annotation is slice-to-slice inconsistency: a nodule annotated as 8.3 mm on slice 142 but 6.1 mm on slice 141, a tumour boundary that jumps 3 mm between adjacent slices without biological justification, or a structure annotated as present on even-numbered slices and absent on odd-numbered slices due to annotator fatigue. These inconsistencies do not prevent the annotation from appearing visually acceptable in spot-check reviews — they only manifest when the 3D volume is rendered or when the training pipeline extracts slice-level crops.

Three practices enforce multi-slice consistency reliably. First, annotation must proceed in sequential slice order rather than jumping to the "most obvious" slice. Second, inter-slice measurement consistency must be quantitatively verified — lesion diameter variance across contiguous slices should not exceed the annotated structure's biological variability at the imaging resolution. Third, 3D rendering must be performed on every annotated volume before QA sign-off, since 3D renders reveal topological errors (disconnected segmentations, slice-boundary artefacts) that 2D slice review misses.

The global AI in radiology market is projected to reach USD$4.5 billion by 2030 (MarketsandMarkets, 2024), with CT-based applications — including lung cancer screening, cardiac risk assessment, and polyp detection — representing the largest segment. The data quality requirements for these applications are correspondingly high: a 2023 meta-analysis in Radiology: Artificial Intelligence found that AI models for pulmonary nodule detection trained on consistently windowed and inter-rater-validated CT datasets achieved 93% sensitivity at the 6 mm detection threshold, compared to 81% sensitivity for models trained on inconsistently annotated data from the same patient population.

Need CT scan annotation for a radiology AI project?

AI Taggers provides board-certified radiologist-in-the-loop CT scan annotation for thoracic, abdominal, and musculoskeletal imaging. HIPAA-compliant, FDA 21 CFR Part 11-aligned provenance, and DICOM-SEG output as standard.

See our CT annotation services

Regulatory Compliance: FDA 21 CFR Part 11 and TGA Requirements

CT annotation for diagnostic AI that will be submitted to the FDA as Software as a Medical Device (SaMD) requires annotation provenance that satisfies FDA 21 CFR Part 11. This regulation governs electronic records and electronic signatures and mandates: tamper-evident audit trails recording who annotated each finding, when, and what changes were made; electronic signature controls preventing annotation identity spoofing; system access logs; and record retention for the duration of device clearance plus applicable post-market surveillance.

Beyond Part 11, FDA's 2021 guidance on AI/ML-based SaMD requires that the training dataset reference standard (ground truth) methodology be documented in the submission, including annotator qualification criteria, inter-rater agreement statistics, and the process for resolving disagreements. This means teams cannot retrospectively reconstruct annotation provenance — the documentation must be built into the annotation workflow from day one.

In Australia, the Therapeutic Goods Administration (TGA) applies similar requirements under the Software as a Medical Device framework for Class III devices. Australian medical AI teams should expect TGA to request annotation methodology documentation as part of conformity assessment for high-risk radiology AI. Our post on FDA 21 CFR Part 11 annotation documentation covers the specific provenance log fields required for regulatory submissions.

Case Study: Pulmonary Nodule Detection Model for Australian Hospital Networks

A radiology AI company developing a pulmonary nodule detection and risk stratification model for deployment across Australian hospital networks engaged AI Taggers to annotate their CT chest dataset. The client had initially attempted annotation with an offshore crowdsourced vendor; preliminary model benchmarking revealed sensitivity of 74% at the 3–6 mm nodule threshold — the most clinically critical size range for early lung cancer detection — which was insufficient for intended clinical use.

Project parameters

Dataset volume

12,400 CT chest volumes from 8 hospital sites

Annotation tasks

Nodule detection, 3D segmentation (≥4 mm), Lung-RADS categorisation, nodule characterisation (LIDC-IDRI protocol)

Regulatory target

FDA 510(k) and TGA Class IIb clearance

Timeline

14 weeks to full dataset completion

Root cause of the original quality problem: The offshore vendor had used non-radiologist annotators working in lung window only. Subsolid and ground-glass nodules — which require mediastinal window cross-referencing to characterise correctly — had been systematically mislabelled or missed. Inter-rater agreement on nodule malignancy classification was kappa = 0.41, indicating poor reliability on the most clinically significant task.

Our approach: We deployed a radiologist-led annotation team using OHIF v3 with custom window preset configurations for lung, mediastinal, and bone windows across every volume. Each CT volume was annotated by one radiologist and QA-reviewed by a second. Contested nodule characterisations — approximately 8% of all nodules — were adjudicated by a senior thoracic radiologist. Annotation provenance was logged in a Part 11-compliant audit system with electronic signature controls. Inter-rater agreement measurement was performed on a rolling 12% sample.

Before and after

Before (offshore vendor)

Model sensitivity at 3–6 mm nodules: 74%
False positives per CT volume: 4.2
Nodule characterisation kappa: 0.41
Subsolid nodule miss rate: 31%
Part 11-compliant audit trail: absent

After (AI Taggers, Week 14)

Model sensitivity at 3–6 mm nodules: 91%
False positives per CT volume: 1.6
Nodule characterisation kappa: 0.79
Subsolid nodule miss rate: 6%
Part 11-compliant audit trail: complete

The sensitivity improvement from 74% to 91% on the 3–6 mm range — the most diagnostically critical — was driven primarily by the multi-window annotation protocol and the adjudication process for subsolid nodules. The false positive reduction from 4.2 to 1.6 per volume came from the radiologist QA layer catching non-nodule structures (vessel cross-sections, scar tissue, mucus plugging) that non-radiologist annotators had incorrectly flagged. The complete Part 11 audit trail made the 510(k) submission documentation straightforward; the client received FDA clearance 11 months after dataset completion.

CT Annotation Cost and Realistic Throughput

CT annotation cost is highly task-dependent. Detection tasks on straightforward findings — fracture presence in a long bone CT, pneumothorax in chest CT — can be performed by trained annotators under radiologist QA at reasonable throughput. Complex volumetric segmentation with characterisation requires radiologist primary annotation, which is slower and more expensive but unavoidable for regulatory-grade data.

Task type	Typical cost (AUD, per CT volume)	Throughput (volumes/radiologist-day)
Detection only (presence/absence)	$8 – $18	35–50
Detection + 2D bounding box	$18 – $30	20–35
3D volumetric segmentation	$60 – $150	8–15
Characterisation (LIDC/Lung-RADS)	$35 – $90	15–25
Longitudinal (multi-timepoint)	Add 20–35% per timepoint	—

For context on how CT annotation fits into broader radiology AI data programmes, our radiology AI annotation guide covers multi-modality workflows including MRI, X-ray, and pathology. Teams building multi-organ models may also need organ segmentation annotation services, which we offer in parallel with CT-level pathology labelling. For whole-slide histopathology workflows that complement CT-based cancer staging, see our overview in histopathology whole-slide annotation.

Pilot annotations — typically 100–300 CT volumes — are available within two to four weeks and allow teams to evaluate annotation quality and inter-rater agreement before committing to full-dataset production. Our CT scan annotation service includes a detailed annotation guideline co-development session before any production annotation begins, ensuring windowing protocols, finding definitions, and characterisation criteria are specified precisely enough to achieve consistent results at scale. We also support broader radiology annotation programmes across modalities.

Frequently Asked Questions

What is CT scan annotation?▼

CT scan annotation is the process of labelling computed tomography volumes for AI training — including finding detection, bounding box or segmentation marking across DICOM slices, Hounsfield-windowed review, structured finding classification, and provenance documentation. It requires DICOM-compatible tools and radiologist involvement at minimum at the QA review stage.

What is Hounsfield windowing and why does it matter for CT annotation?▼

Hounsfield units measure radiodensity. Windowing maps a chosen HU range to the display greyscale, making specific tissue types visible. Lung window (-600 to +1600 HU) shows nodules and airways; mediastinal window (-160 to +240 HU) reveals soft tissue. Wrong windowing causes annotators to miss or mislabel findings systematically — it is the single most common technical error in CT annotation.

Do radiologists need to annotate CT scans, or can trained annotators do it?▼

Trained annotators can perform detection tasks (presence/absence) on well-characterised findings under radiologist QA review. Characterisation, malignancy risk, or subtle pathology tasks require radiologist primary annotation. FDA and TGA regulatory submissions for diagnostic AI typically require radiologist-annotated ground truth regardless of task difficulty.

What file format is used for CT scan annotation?▼

CT source images are stored in DICOM format. Annotations are exported as DICOM-SEG (segmentation), NIfTI (.nii.gz for volumetric segmentation), or NRRD. Never convert DICOM to PNG or JPEG — lossy compression corrupts Hounsfield values and invalidates the annotation for clinical AI training.

How much does CT scan annotation cost per volume?▼

Detection only: AUD $8–$18 per volume. Detection plus 2D bounding box: $18–$30. 3D volumetric segmentation: $60–$150. Characterisation (LIDC/Lung-RADS protocol): $35–$90. Longitudinal multi-timepoint annotation adds 20–35% per timepoint.

What are the regulatory requirements for CT annotation in medical AI?▼

FDA 21 CFR Part 11 requires tamper-evident audit trails, electronic signature controls, and record retention for annotations used in SaMD submissions. FDA guidance on AI/ML-based SaMD additionally requires annotator qualification documentation and inter-rater agreement statistics. Australian TGA requirements under the SaMD framework are broadly equivalent for Class III devices.

Free Sample · 24-48 hours

Get a Quote for CT Scan Annotation

Tell us about your radiology AI project — modality, finding types, volume, and regulatory target — and we'll outline an annotation approach and price estimate within one business day.

Neel Bennett

AI Annotation Specialist at AI Taggers

Neel has over 8 years of experience in AI training data and machine learning operations. He specializes in helping enterprises build high-quality datasets for computer vision and NLP applications across healthcare, automotive, and retail industries.

Connect on LinkedIn