This guide does not cover the basics of what whole-slide imaging is. It covers the operational decisions that separate production annotation programmes from research pilots: tile-level vs slide-level task architecture, platform selection for gigapixel-scale workflows, pathologist credentialing requirements, adjudication mechanics, and the provenance documentation that FDA 21 CFR Part 11 and Australia's TGA SaMD framework require before a dataset supports a regulatory submission.
If you are planning a histopathology AI dataset for clinical deployment and have not yet mapped these decisions, this is where the production problems start — not at model training.
Tile-Level vs Slide-Level Task Architecture
The first architectural decision on any WSI annotation project: do you annotate at tile level, slide level, or — as almost all production projects require — both?
Tile-level annotation extracts fixed-size patches from the WSI and labels each patch independently. Standard tile sizes are 256×256 or 512×512 pixels at 20× magnification (approximately 0.5 μm/pixel). Tumour-present / tumour-absent classification, mitosis detection, and patch-level grading all operate this way. The advantage: parallelisable, model-friendly, and compatible with patch-based deep learning architectures (ResNet, EfficientNet, ViT). The limitation: patch-level annotations lose spatial context — a patch at the tumour margin looks different from a patch at the tumour centre, and the model needs to know that.
Slide-level annotation draws directly on the full pyramidal image in global WSI coordinate space. An annotator opens the multi-resolution pyramid, pans and zooms between thumbnail and cellular resolution, and draws annotations that persist across zoom levels in absolute slide coordinates. Tumour-region polygon masks, IHC grading scores, and Gleason pattern territory maps are slide-level tasks. These annotations export as GeoJSON or tool-native project files and remain aligned regardless of which tile extraction grid the downstream model uses.
The production workflow almost always layers both: slide-level ROI annotation defines the clinically relevant region, tile extraction occurs within that region (typically at 50% stride to give positional variety), and tile-level classification runs on the extracted patches. Maintaining clean coordinate alignment between layers — so that tile-level predictions can be stitched back into slide-level probability maps — is where generic annotation pipelines break down and WSI-native tooling earns its cost.
Tile extraction strategy also requires deliberate choices before annotation starts. Otsu thresholding for tissue masking removes background glass and cuts annotation surface area by 30–60% on typical surgical specimens. Colour normalisation (Macenko, Reinhard, or stain-vector estimation) before tiling standardises what annotators see — inconsistency here means annotators are unknowingly labelling stain artefacts as biological signal.
WSI Annotation Platform Selection
Platform choice is not a tooling preference — it is a regulatory decision.
QuPath is the open-source gold standard. It is scriptable via Groovy and Python, supports all major pyramidal formats (.svs, .ndpi, .tiff, DICOM-WSI), has an active community, and is genuinely good for slide-level segmentation and cell-detection tasks. Its limitation at production scale: collaboration and version control are not built in. Multi-annotator workflows require wrapping QuPath in project-management infrastructure, and Part 11-aligned audit trails must be implemented externally.
Commercial platforms address this. Aperio eSlide Manager (Leica Biosystems), PathAI Palette, and Proscia Concentriq all provide workload management, role-based access control, and electronic record infrastructure that maps to 21 CFR Part 11 requirements. The licence cost is real but it eliminates the engineering work of building compliant audit infrastructure on top of QuPath.
Generic CV annotation platforms — CVAT, Labelbox, Scale — have limited or no native WSI support. CVAT handles tile-level classification tasks on pre-extracted patches effectively. Labelbox has experimental WSI functionality but is not stable at gigapixel scale. If your task includes anything requiring global slide coordinates (tumour boundary segmentation, spatial co-registration, multi-resolution annotator navigation), use a WSI-native platform. If you are doing tile classification on pre-extracted patches only, CVAT or equivalent is adequate.
Pathologist Credentialing: Subspecialty Matters More Than Seniority
Credentialing for pathology AI annotation is not a binary pathologist-vs-not decision. It is a two-dimensional problem: specialty match and workflow role.
Subspecialty alignment to tissue type is more important than seniority. A gastrointestinal pathologist is not the correct adjudicator for Gleason grading. A dermatopathologist should not be the primary reviewer on diffuse large B-cell lymphoma margin annotation. The spec must explicitly map each task to the required subspecialty — discordance here is the single most common reason pathology AI datasets fail initial regulatory review.
For FDA 510(k) and PMA submissions, annotator credentials must be documented per annotation, not just per project. The standard documentation pack: specialist medical registration (board-certified anatomical pathologist in the US; FRCPA for Australia), subspecialty fellowship certificate where the tissue type requires it, years of relevant diagnostic experience, and a signed protocol attestation confirming familiarity with the specific disease and annotation specification before work begins.
The practical crew structure that holds up in production: trained biomedical annotators (typically histotechnician or biomedical science background) handle volume tasks within a tightly scoped reference set built by pathologists. All borderline, uncertain, or flagged annotations route to pathologist adjudication. Credentialed pathologists spend time on the tasks only they can do — not on routine background segmentation. See clinical expert annotation for how this crew model is structured operationally.
Multi-Pathologist Adjudication: The Protocol Details
Having three pathologists is not the same as having a valid adjudication protocol.
The standard protocol: each pathologist annotates independently, without access to any other reviewer's labels, using the same version of the annotation guidelines. After all reviewers complete a slide, a comparison run surfaces disagreements. Simple majority vote handles categorical grading calls (Gleason 3 vs 4, Nottingham grade boundary) when two of three agree. Cases with strong dissent — where two reviewers agree but a third disagrees with documented reasoning — escalate to a senior subspecialist rather than resolving by automatic vote.
Report inter-annotator agreement at the class level, not just overall. A weighted κ of 0.74 across all Gleason patterns can conceal κ of 0.48 on the 3/4 boundary — the clinically critical decision point. The class-level kappa distribution belongs in the dataset documentation; regulators and model teams both need it. The QA methodology for generating these numbers is covered in detail in the Cohen's kappa annotation quality guide.
Disagreement pattern tracking is systematically underused. If the same pair of pathologists repeatedly disagrees on a specific tissue pattern, that is a calibration signal — run a targeted re-calibration session focused on that tissue before proceeding with the next batch. If disagreements cluster on slides from a specific contributing lab, the stain protocol is likely the cause. The adjudication log is a diagnostic instrument, not just a compliance artefact.
Track kappa trend across batches rather than measuring once at project end. Downward drift in IAA over time indicates annotator fatigue or protocol drift — both are correctable early, neither is correctable at project close.
Running a histopathology annotation project?
We run production WSI annotation programmes with credentialed pathologist adjudication, 21 CFR Part 11-aligned audit trails, and per-class kappa reporting. Pilots delivered in 72 hours.
See histopathology annotation servicesFDA 21 CFR Part 11 Provenance: What Your Annotation Workflow Must Produce
21 CFR Part 11 governs electronic record integrity. It is distinct from HIPAA, which governs PHI security. For AI/ML SaMD submissions under 21 CFR Part 820 and FDA's Predetermined Change Control Plans guidance, the annotation dataset supporting the submission must carry Part 11-aligned provenance from the first annotation event.
The specific requirements as they apply to annotation workflows:
- Audit trail. Every annotation event — creation, modification, deletion — is timestamped and attributed to a uniquely identified user. The audit trail must be non-editable and separate from the annotation data itself.
- Electronic signatures. Pathologist adjudicators sign off per slide, not per batch. The signature must link to the annotator's identity record and the specific annotation version being approved.
- Access controls. Role-based, with annotator, reviewer, and administrative privileges separated. Escalation and approval actions must be restricted to the reviewer role by system enforcement, not convention.
- System validation. The annotation platform must have IQ/OQ/PQ documentation or equivalent validation records. Commercial platforms with existing Part 11 documentation avoid building this from scratch.
- Backup and recovery. A defined recovery point objective, maintained backup logs, and a tested recovery procedure. Regulators ask for evidence of testing, not just policy.
The Australian TGA expects equivalent provenance under the SaMD guidelines aligned with IMDRF/SaMD N23. For EU IVDR Class IIb and III devices, MDCG 2021-6 requires equivalent training data traceability. The documentation requirements converge on the same functional requirements across jurisdictions even if the specific regulatory text differs.
Build this infrastructure before annotating the first production slide. Retrofitting Part 11-aligned provenance onto a completed annotation dataset is expensive and usually incomplete — regulators can tell when audit trails were reconstructed rather than captured in real time.
QC Gates Between Tiling and Model Training
Production WSI annotation programmes run at least four mandatory QC gates before training data is released to the model team.
- Slide quality screening. Before annotation begins, every WSI passes automated quality assessment with HistoQC (Madabhushi Lab, open-source). HistoQC reports blur scores, coverage metrics, stain quality indicators, and artefact flags. Slides with failing scores are manually reviewed before annotation commences — annotating a blurred slide produces training data that teaches the model to fail on poor-quality acquisitions.
- Batch concordance checks. Every annotation batch includes 5–10% carry-over slides also annotated in the preceding batch. Per-class kappa between batches measures annotation drift. If kappa on a class is declining batch-on-batch, investigate annotator fatigue or guideline ambiguity before continuing. Catching drift at batch 3 costs one re-calibration session; catching it at batch 12 means re-annotating thousands of slides.
- Spatial consistency at tile boundaries. For segmentation tasks, post-processing scripts check that annotation boundaries are consistent across adjacent tile seams. Boundary inconsistencies — where the same tumour edge is labelled differently on overlapping tiles — create training artefacts that make the model unreliable at exactly the spatial scale where clinical decisions happen.
- Pre-release dataset audit. Before the dataset is frozen, run a class balance audit (especially for cancer/non-cancer splits, where imbalance is common), a source-lab diversity check (stain protocol coverage across contributing sites), and a magnification-consistency verification (confirm that all tiles were extracted at the specified objective, not mixed resolutions).
These gates catch problems that per-annotation QA misses. They are cheap to run programmatically and expensive to discover post-training. The general framework for annotation quality management at scale is documented in the annotation QA playbook; the WSI-specific gates above extend that framework for pathology workflows.
What Production WSI Annotation Costs in 2026
Reference ranges for production annotation with clinically validated quality controls, June 2026. These are not research-grade rates.
- Tissue presence / background segmentation: $0.50–$2.00 per slide. Largely automatable via HistoQC; human annotation is verification, not primary labelling.
- Tumour-region polygon annotation — common cancers (clear-cell renal carcinoma, ductal breast carcinoma): $15–$50 per slide.
- Tumour-region polygon annotation — complex histology (diffuse glioma margin, invasive lobular carcinoma, mixed-type tumours): $40–$120 per slide.
- Gleason or Nottingham grading with credentialed pathologist annotation: $60–$180 per slide.
- Multi-pathologist adjudication (three-pathologist panel): 2×–3× multiplier on the per-slide rate above.
- Cell-level nuclear segmentation: $80–$250 per mm² of annotated tissue, varying with cell density and tissue type.
Research-grade annotation without regulatory documentation runs at 40–60% of these figures. That dataset cannot be used directly in a device submission without a documented retrospective validation exercise — which typically costs more than the savings made by skipping production controls in the first place. Full pricing transparency across annotation task types is covered in the 2026 annotation pricing breakdown.
Related Reading
- → Histopathology annotation service
- → Pathology image annotation service
- → Clinical expert annotation
- → Organ segmentation annotation
- → Histopathology AI annotation: whole-slide imaging overview
- → Radiology AI annotation: DICOM, MRI, CT, X-ray
- → Cohen's kappa and inter-annotator agreement
Frequently Asked Questions
What is the difference between tile-level and slide-level WSI annotation?
Tile-level annotation extracts fixed patches (typically 256×256 or 512×512 px at 20×) and labels each independently — used for tumour/no-tumour classification and mitosis detection. Slide-level annotation draws directly on the full pyramidal image in global WSI coordinate space — used for tumour-region segmentation, Gleason mapping, and IHC grading. Production projects layer both: slide-level ROI defines the region, tile extraction occurs within it, and tile-level classification runs on the patches. Coordinate alignment between layers is the critical engineering discipline.
Do I need a WSI-native annotation platform, or will CVAT or Labelbox work?
For any task requiring global slide coordinates — tumour boundary segmentation, spatial co-registration, multi-resolution zoom context — you need a WSI-native platform (QuPath, Aperio eSlide Manager, Proscia Concentriq, PathAI Palette). For tile-level classification on pre-extracted patches only, CVAT is adequate. The regulatory dimension matters separately: for FDA 21 CFR Part 11 submissions, the annotation platform must provide electronic record and audit trail capability that CVAT and Labelbox do not offer natively.
What credentials do pathologists need for FDA-submission histopathology annotation?
For 510(k) and PMA submissions, annotator credentials must be documented per annotation. The standard pack: specialist medical registration (board-certified anatomical pathologist in the US, FRCPA for Australia), subspecialty fellowship where the tissue warrants it, years of relevant diagnostic experience, and a protocol attestation signed before work begins. Subspecialty match to tissue type matters more than seniority — a dermatopathologist should not adjudicate diffuse large B-cell lymphoma.
How many pathologists are needed for clinical-grade WSI adjudication?
Two is the minimum; three is the standard for FDA submissions because it enables majority voting without requiring an automatic tiebreaker. Each annotator works independently, without seeing others' labels. Disagreements are reviewed by the panel or escalated to a senior subspecialist for strong dissents. Report inter-annotator agreement per class — overall kappa can mask poor agreement on the clinically significant boundaries (e.g. Gleason 3/4). Track kappa trends across batches to catch annotator drift early.
What does FDA 21 CFR Part 11 require for annotation provenance?
Part 11 requires: a non-editable audit trail attributing every annotation event to a specific user with a timestamp; per-slide electronic signatures from pathologist adjudicators; role-based access controls enforced by the system; platform IQ/OQ/PQ validation documentation; and defined backup/recovery procedures with tested recovery logs. Build this infrastructure before the first annotation event — retrofitting it to a completed dataset is expensive and rarely accepted by regulators without gaps.
What does production WSI annotation cost per slide in 2026?
Background segmentation: $0.50–$2.00/slide. Tumour-region polygons (common cancers): $15–$50/slide; complex histology: $40–$120/slide. Credentialed Gleason/Nottingham grading: $60–$180/slide. Three-pathologist adjudication panel adds 2×–3×. Nuclear segmentation: $80–$250 per mm² of tissue. Research-grade annotation runs 40–60% cheaper but requires a retrospective validation exercise before it can support a device submission — which usually costs more than the upfront saving.
Start a histopathology annotation pilot
Send 5–10 WSIs from your most challenging tissue type. We deliver tile-level or slide-level annotation with credentialed pathologist adjudication, per-class kappa reporting, and 21 CFR Part 11-aligned provenance.
Neel Bennett
AI Annotation Specialist at AI Taggers
Neel has over 8 years of experience in AI training data and machine learning operations. He specializes in helping enterprises build high-quality datasets for computer vision and NLP applications across healthcare, automotive, and retail industries.
Connect on LinkedIn