Digital pathology is one of the most exciting corners of medical AI and one of the easiest to do badly. The slides are enormous — billions of pixels — the diseases live in tiny regions, the stain colour changes lab to lab, and the people qualified to call it are pathologists, who don't come cheap. Get any of that wrong and the model trains on something, but it doesn't train on the disease.
This guide is what we'd hand a team starting a histopathology or biopsy AI project — written from a few years of running these workflows for diagnostic and research clients. It covers what makes WSI different, the tasks you'll actually annotate, who has to do it, what it costs, and the regulatory bar your dataset has to clear if you ever want to ship in a clinical setting.
Whole-Slide Imaging: Why It's Not Just a Big Photo
A whole-slide image is a scan of a microscope slide at clinical resolution — typically 40x objective, often around 100,000 × 50,000 pixels per slide. Stored in pyramidal multi-resolution formats (.svs, .ndpi, .tiff, increasingly DICOM-WSI) so a viewer can show you the full slide at once or zoom down to nuclei without re-rendering.
The number to internalise — a single slide can be five gigapixels. You don't open it in Photoshop. You don't annotate it in CVAT's standard image mode. You need a WSI-native viewer (QuPath is the workhorse, plus several commercial tools) that streams tiles from the pyramid as the annotator pans and zooms. Anyone telling you they'll annotate pathology slides with a generic image labeller is signalling that they haven't actually done this work.
The Gigapixel Problem (And How To Actually Handle It)
Three challenges show up on every WSI project:
- Reviewer time. A pathologist reviewing a single slide for AI ground truth can easily take 45–90 minutes. Multiply by the dataset size and you've found your real budget driver.
- Tiling and coordinates. The slide gets broken into patches for the model, but every patch has to know its absolute slide coordinates so tumour-boundary work survives the patching. Annotation has to happen on the full WSI; tiling is downstream.
- Stain variation. H&E from a Sydney lab and H&E from a Saudi lab can look like different staining protocols. They are. The dataset needs to span that variation deliberately, not accidentally.
The serious workflows handle this with WSI-native annotation tools, lab-aware sampling so multiple labs are represented, and colour-normalisation work either before annotation (to standardise what the annotator sees) or as a training augmentation (to teach the model to be invariant). Pick one strategy and stick with it.
The Annotation Tasks That Actually Run on Pathology Projects
Real projects mix several of these on the same slides:
- Region of interest (ROI) selection. Pathologist or trained reviewer marks the tumour-bearing area of the slide so downstream tasks focus on what matters. Often the very first task.
- Tissue-vs-background segmentation. Separating actual tissue from glass and air. Cheap, fast, useful for normalising slide statistics.
- Tumour-region segmentation. Polygon masks around malignant regions. The bread and butter of cancer-detection AI.
- Cell-level instance segmentation. Especially nuclei — individual nucleus contours for grading and counting work. Mitosis annotation is a specialised flavour of this.
- IHC scoring. Allred score for hormone receptor work, H-score for membrane proteins, percentage positivity for Ki-67 and others. Trained annotators with pathologist adjudication is the realistic structure.
- Disease-specific grading. Gleason patterns for prostate, Nottingham grade for breast, Bloom-Richardson, ICDR ophthalmic if you're reaching into the ophthalmology guide for adjacent reading.
Who Actually Has to Do the Annotation
The honest answer — it's a layered team, not a single annotator type. What works in practice:
- Pathologists at the top of the loop. They build the spec, build the gold-standard reference set, adjudicate borderline cases, and sign off on the QA process. Their time is the most expensive and you want it spent where it counts.
- Pathology-trained annotators in the middle. People with medical-science backgrounds (often bio-medicine grads or histotechnicians) who have been trained on the project's spec. They do most of the volume.
- QA on top. Double-annotation, gold-set checks every batch, adjudication queue back to the pathologists. The general structure is the same as our annotation QA playbook — but with pathology-specific concordance metrics on top.
The wrong structure — generalist crowd annotators with no medical training and no pathologist adjudication — produces datasets that look fine and train models that fail at the regulator. We've been brought in to fix several of these. It's not cheap. See clinical expert annotation for the formal version of this layered team.
Quality: Why Inter-Pathologist Agreement Is The Real Metric
Here's the bit that surprises a lot of ML teams new to pathology — pathologists genuinely disagree with each other. Inter-pathologist agreement on borderline cases of dysplasia or Gleason 3 vs 4 is often weighted κ around 0.6–0.7. That's not a failure of the field; it's the nature of cell morphology at the boundaries. It also means a one-pathologist gold standard is misleading.
Clinical-grade datasets use consensus gold standards — two or three pathologists, blinded, with a defined resolution rule when they disagree (majority vote, or escalation to a senior reviewer). The reported metric is concordance against this consensus, broken out per grade class. Anything reporting a single overall accuracy number against a single reviewer is doing pathology AI the cheap way, and the cheap way doesn't pass regulators.
Regulatory Context You Can't Ignore
If your pathology AI is heading anywhere near clinical use, the regulator wants paperwork. Australian TGA, US FDA (510(k) or De Novo), and EU IVDR have overlapping but not identical bars. What they all want — annotator credentials documented per annotation, protocol versioned with clinical sign-off, inter-annotator agreement reported per class, adjudication trail for every disagreement, per-slide provenance log. The annotation side has to support the regulatory submission; you cannot bolt it on afterwards. Build the documentation from day one or pay double later.
Need a histopathology pilot?
Send 5–10 WSIs from your hardest tissue type — we'll deliver ROI selection or tumour-region segmentation done by trained annotators with pathologist adjudication, in 72 hours.
See our histopathology serviceWhat It Actually Costs
Pathology annotation is genuinely one of the more expensive task types — credentialed-pathologist time is non-negotiable for clinical work, slides take a long time per WSI, and consensus protocols multiply that by the number of reviewers in the panel. Pricing models vary by task: per-slide for ROI and grading, per-region for tumour segmentation, per-cell or per-mm² for nuclear work, per-image for IHC scoring. The only number that matters is the one a pilot on your slides produces. Anyone quoting a flat per-slide rate without seeing your tissue type, your stain protocol and your reviewer requirements is guessing.
Related Reading
- → Histopathology annotation service
- → Pathology image annotation service
- → Clinical expert annotation
- → Ophthalmology AI annotation guide
- → The annotation QA playbook
- → Healthcare annotation (overview)
Get a histopathology pilot in 72 hours
Send a small slide set (5–10 WSIs) — we'll deliver ROI or tumour-region segmentation with pathologist adjudication and a per-class concordance report.
Neel Bennett
AI Annotation Specialist at AI Taggers
Neel has over 8 years of experience in AI training data and machine learning operations. He specializes in helping enterprises build high-quality datasets for computer vision and NLP applications across healthcare, automotive, and retail industries.
Connect on LinkedIn