What is histopathology image annotation?

Histopathology image annotation is the expert labelling of microscope slide images — biopsies, surgical specimens, and cytology samples — so AI models can detect, classify and grade disease. The unit of work is usually a whole-slide image (WSI), often gigapixels in size, and the annotation has to clear a clinical-grade bar because the models trained on it can end up influencing diagnosis. Generalist annotation does not work here.

What is a whole-slide image (WSI)?

A whole-slide image is a high-resolution digital scan of a microscope slide — typically 50,000–100,000 pixels per side at 40x magnification, billions of pixels in total. WSIs are stored in pyramidal multi-resolution formats (Aperio .svs, Hamamatsu .ndpi, Philips .tiff, DICOM-WSI) so a viewer can zoom from full-slide thumbnail down to single-cell detail without re-rendering. Every annotation task in digital pathology is built around the WSI.

Why is gigapixel imagery hard to annotate?

Three reasons. Scale — a single slide can take a pathologist an hour to review at the resolution AI needs. Tiling — the slide has to be broken into manageable patches but each patch needs to know its position in the full slide for tumour-boundary work. Stain variation — H&E from different labs looks dramatically different, and the model has to learn through that, which means the annotation has to too. None of this is solved by a generic image labelling tool — you need a WSI-native viewer.

Do you need pathologists to annotate histopathology images?

For region-of-interest selection and basic tissue presence, trained generalist annotators working from a pathologist-built reference set can do high-quality work. For tumour grading, mitosis counting, IHC scoring, dysplasia classification and any clinical-grade task, you need credentialed pathologists — or you need pathologist adjudication of every borderline case. Skipping this is the most common reason pathology AI datasets fail to gain regulatory traction.

What annotation tasks are used in digital pathology AI?

Common tasks: region-of-interest (ROI) selection on the WSI; tissue-vs-background segmentation; tumour-region segmentation; cell-level instance segmentation (especially nuclei); mitosis counting and grading; IHC scoring (Allred, H-score, percentage positivity); and disease-specific grading scales (Gleason for prostate, Nottingham for breast, etc). Real projects almost always combine several of these on the same slide.

What file formats are used for WSI annotation?

On the slide side — Aperio .svs is the de-facto standard, Hamamatsu .ndpi and Philips .tiff also common, and DICOM-WSI is increasingly used for regulatory-aligned pipelines. On the annotation side — GeoJSON or QuPath project files for polygon work, plus per-slide CSV / JSON for IHC scores and grading. Most clinical-grade work passes through QuPath or HistoQC at some stage, even when the production annotation happens elsewhere.

How is histopathology annotation quality measured?

Pixel-level IoU / Dice for segmentation tasks (especially nuclei and tumour-region work), F1 for cell-detection, Cohen's kappa or weighted kappa against a consensus reviewer for grading tasks. Critically — the gold standard is not a single pathologist; it is a small panel reaching consensus, because pathologists genuinely disagree on borderline cases and a one-reviewer gold standard underestimates real-world variance.

How much does histopathology annotation cost?

It is one of the higher-cost annotation types because credentialed-pathologist time is genuinely expensive, slides are slow to review, and consensus protocols are mandatory for clinical-grade work. Pricing is usually per slide, per region, or per cell depending on the task. The honest scoping move is a pilot on 5–10 representative slides from your hardest tissue type and your messiest stain protocol — generic per-slide quotes mean very little until that pilot has run.

Histopathology AI Annotation: Whole-Slide Imaging, Biopsy & Gigapixel Workflows (2026)

Digital pathology is one of the most exciting corners of medical AI and one of the easiest to do badly. The slides are enormous — billions of pixels — the diseases live in tiny regions, the stain colour changes lab to lab, and the people qualified to call it are pathologists, who don't come cheap. Get any of that wrong and the model trains on something, but it doesn't train on the disease.

This guide is what we'd hand a team starting a histopathology or biopsy AI project — written from a few years of running these workflows for diagnostic and research clients. It covers what makes WSI different, the tasks you'll actually annotate, who has to do it, what it costs, and the regulatory bar your dataset has to clear if you ever want to ship in a clinical setting.

Whole-Slide Imaging: Why It's Not Just a Big Photo

A whole-slide image is a scan of a microscope slide at clinical resolution — typically 40x objective, often around 100,000 × 50,000 pixels per slide. Stored in pyramidal multi-resolution formats (.svs, .ndpi, .tiff, increasingly DICOM-WSI) so a viewer can show you the full slide at once or zoom down to nuclei without re-rendering.

The number to internalise — a single slide can be five gigapixels. You don't open it in Photoshop. You don't annotate it in CVAT's standard image mode. You need a WSI-native viewer (QuPath is the workhorse, plus several commercial tools) that streams tiles from the pyramid as the annotator pans and zooms. Anyone telling you they'll annotate pathology slides with a generic image labeller is signalling that they haven't actually done this work.

The Gigapixel Problem (And How To Actually Handle It)

Three challenges show up on every WSI project:

Reviewer time. A pathologist reviewing a single slide for AI ground truth can easily take 45–90 minutes. Multiply by the dataset size and you've found your real budget driver.
Tiling and coordinates. The slide gets broken into patches for the model, but every patch has to know its absolute slide coordinates so tumour-boundary work survives the patching. Annotation has to happen on the full WSI; tiling is downstream.
Stain variation. H&E from a Sydney lab and H&E from a Saudi lab can look like different staining protocols. They are. The dataset needs to span that variation deliberately, not accidentally.

The serious workflows handle this with WSI-native annotation tools, lab-aware sampling so multiple labs are represented, and colour-normalisation work either before annotation (to standardise what the annotator sees) or as a training augmentation (to teach the model to be invariant). Pick one strategy and stick with it.

The Annotation Tasks That Actually Run on Pathology Projects

Real projects mix several of these on the same slides:

Region of interest (ROI) selection. Pathologist or trained reviewer marks the tumour-bearing area of the slide so downstream tasks focus on what matters. Often the very first task.
Tissue-vs-background segmentation. Separating actual tissue from glass and air. Cheap, fast, useful for normalising slide statistics.
Tumour-region segmentation. Polygon masks around malignant regions. The bread and butter of cancer-detection AI.
Cell-level instance segmentation. Especially nuclei — individual nucleus contours for grading and counting work. Mitosis annotation is a specialised flavour of this.
IHC scoring. Allred score for hormone receptor work, H-score for membrane proteins, percentage positivity for Ki-67 and others. Trained annotators with pathologist adjudication is the realistic structure.
Disease-specific grading. Gleason patterns for prostate, Nottingham grade for breast, Bloom-Richardson, ICDR ophthalmic if you're reaching into the ophthalmology guide for adjacent reading.

Who Actually Has to Do the Annotation

The honest answer — it's a layered team, not a single annotator type. What works in practice:

Pathologists at the top of the loop. They build the spec, build the gold-standard reference set, adjudicate borderline cases, and sign off on the QA process. Their time is the most expensive and you want it spent where it counts.
Pathology-trained annotators in the middle. People with medical-science backgrounds (often bio-medicine grads or histotechnicians) who have been trained on the project's spec. They do most of the volume.
QA on top. Double-annotation, gold-set checks every batch, adjudication queue back to the pathologists. The general structure is the same as our annotation QA playbook — but with pathology-specific concordance metrics on top.

The wrong structure — generalist crowd annotators with no medical training and no pathologist adjudication — produces datasets that look fine and train models that fail at the regulator. We've been brought in to fix several of these. It's not cheap. See clinical expert annotation for the formal version of this layered team.

Quality: Why Inter-Pathologist Agreement Is The Real Metric

Here's the bit that surprises a lot of ML teams new to pathology — pathologists genuinely disagree with each other. Inter-pathologist agreement on borderline cases of dysplasia or Gleason 3 vs 4 is often weighted κ around 0.6–0.7. That's not a failure of the field; it's the nature of cell morphology at the boundaries. It also means a one-pathologist gold standard is misleading.

Clinical-grade datasets use consensus gold standards — two or three pathologists, blinded, with a defined resolution rule when they disagree (majority vote, or escalation to a senior reviewer). The reported metric is concordance against this consensus, broken out per grade class. Anything reporting a single overall accuracy number against a single reviewer is doing pathology AI the cheap way, and the cheap way doesn't pass regulators.

Regulatory Context You Can't Ignore

If your pathology AI is heading anywhere near clinical use, the regulator wants paperwork. Australian TGA, US FDA (510(k) or De Novo), and EU IVDR have overlapping but not identical bars. What they all want — annotator credentials documented per annotation, protocol versioned with clinical sign-off, inter-annotator agreement reported per class, adjudication trail for every disagreement, per-slide provenance log. The annotation side has to support the regulatory submission; you cannot bolt it on afterwards. Build the documentation from day one or pay double later.

Need a histopathology pilot?

Send 5–10 WSIs from your hardest tissue type — we'll deliver ROI selection or tumour-region segmentation done by trained annotators with pathologist adjudication, in 72 hours.

See our histopathology service

What It Actually Costs

Pathology annotation is genuinely one of the more expensive task types — credentialed-pathologist time is non-negotiable for clinical work, slides take a long time per WSI, and consensus protocols multiply that by the number of reviewers in the panel. Pricing models vary by task: per-slide for ROI and grading, per-region for tumour segmentation, per-cell or per-mm² for nuclear work, per-image for IHC scoring. The only number that matters is the one a pilot on your slides produces. Anyone quoting a flat per-slide rate without seeing your tissue type, your stain protocol and your reviewer requirements is guessing.

Histopathology AI Annotation: Whole-Slide Imaging, Biopsy & Gigapixel Workflows