Quick answer
Polygon annotation is worth the 3–8× cost premium over bounding boxes when the object being labelled is highly non-rectangular and the background pixels that a bounding box would capture would confuse the model. The practical threshold: if a tightly drawn bounding box around your target object would be more than 30% background, switching to polygon annotation typically produces a measurable improvement in model precision. Below 30%, bounding boxes are sufficient and the cost premium is unlikely to be recovered in performance gains.
What Polygon Annotation Produces and What It Costs
Polygon annotation produces a closed contour — a sequence of (x, y) vertex coordinates that trace the visible boundary of an object — rather than the rectangular enclosure that a bounding box provides. The output polygon follows the actual silhouette of the object: the irregular edge of a surface defect, the angled profile of a vehicle, the curved boundary of a piece of fruit, the non-orthogonal contour of a logo in a scene.
The cost premium is real and consistent. According to a 2024 dataset pricing study by Sama (a major annotation vendor), polygon annotation at a 15-vertex mean per object costs approximately 4.2× as much as bounding box annotation on the same image set. At 20+ vertices, the multiple rises to 6–8×. For teams building computer vision models with datasets in the hundreds of thousands of images, that cost differential is a genuine budget decision, not a footnote.
Production cost benchmarks for Australian annotation projects in 2026: bounding box annotation costs AUD $0.04–$0.12 per object depending on object density and scene complexity. Polygon annotation at 12–20 vertices per object costs AUD $0.22–$0.55 per object. Complex polygons at 30–50 vertices cost AUD $0.55–$1.20 per object. The format question matters too: COCO JSON with polygon coordinates is the production standard; GeoJSON is used for geospatial polygons; Pascal VOC XML with polygon extensions is legacy but still found in older pipelines.
The 30% Background Rule: A Practical Threshold
The most practical decision heuristic for choosing between bounding boxes and polygons is the background pixel fraction: what percentage of the pixels inside a tightly drawn bounding box around your target object are not part of the object?
When that fraction exceeds 30%, background pixels become a confounding signal. Feature extractors — whether convolutional or transformer-based — learn from the full receptive field of a bounding box. A box around an irregular defect that is 45% background teaches the model features of the non-defect surface rather than the defect itself. A polygon that follows the actual defect boundary removes that noise.
Objects that consistently exceed the 30% threshold and benefit from polygon annotation: irregularly shaped surface defects (delamination, cracks, blemishes), garments and soft goods on human models, organic shapes (fruit, vegetables, leaves), vehicles at oblique angles, logos and text in uncontrolled scenes, and manufacturing components with non-rectangular silhouettes. Objects that stay well below the threshold: upright pedestrians, frontal vehicle views, rectangular packaged goods — for these, bounding boxes are sufficient.
A 2023 analysis by the Computer Vision Foundation found that switching from bounding boxes to polygon annotations produced mean mAP@0.75 improvements of 12–18 percentage points for irregular-shape object classes and 2–4 percentage points for near-rectangular object classes — confirming that shape regularity is the dominant predictor of whether polygon annotation is cost-effective.
Where Polygon Annotation Wins: Five Domains
Surface defect detection in manufacturing and food processing
Defects — cracks, delamination, blemishes, contamination marks — are rarely rectangular. A bounding box drawn around an elongated surface crack includes large areas of clean surface on both sides. The model learns to associate 'crack-adjacent clean surface' as a positive feature, degrading precision at operational thresholds. Polygon annotation traces the defect boundary and removes that confounding signal.
Produce grading and agricultural quality inspection
Fruit blemishes, bruising, and skin breaks are irregular in shape and often partially occluded by the fruit's curvature. Bounding boxes on curved fruit surfaces can be 40–60% background. Polygon annotation at 10–16 vertices per defect region provides clean training signal for the defect boundary without the confounding background.
Retail and fashion: product segmentation for virtual try-on
Garment segmentation for virtual try-on requires tracing the boundary between the garment and the background, the model, and other garments. Bounding boxes are useless for this task — the model needs to know exactly which pixels are the garment. Polygon annotation at 20–40 vertices per garment piece is the cost-effective alternative to full semantic segmentation for catalogue-scale product imaging.
Logo and brand detection in uncontrolled imagery
Logos in sports footage, retail shelving, and user-generated content appear at arbitrary rotations and scales. A bounding box aligned to the image axes around a rotated logo includes large off-logo regions. Polygon annotation with vertices at each logo corner eliminates background noise and markedly improves detection recall at tight IoU thresholds.
Infrastructure and geospatial: building footprints and irregular features
Building footprint extraction from aerial or satellite imagery requires polygon contours — buildings are rarely rectangular and the difference between a polygon footprint and a bounding box footprint is a significant area of rooftop that is incorrectly classified. For geospatial AI pipelines, polygon annotation in GeoJSON is the standard format for this class of task.
Vertex Count: The Specification That Determines Both Cost and Quality
The single most important annotation specification for polygon projects — and the one most frequently omitted from annotation guidelines — is vertex count per object class. Without a specified vertex count, annotators solve the problem differently: some trace tight 6-vertex approximations (producing bounding-box-quality training data at polygon cost), others over-specify with 80+ vertices on simple shapes (doubling cost without improving model performance).
Vertex budgets should be specified per class, not as a single project-wide constraint. Appropriate ranges: simple convex shapes like round fruit — 8–14 vertices; vehicle side profiles — 14–22 vertices; irregular surface defects — 16–30 vertices; building footprints with setbacks — 12–24 vertices. The guideline should include annotated examples showing acceptable and unacceptable polygon quality at the target vertex count.
Underspecified polygons (4–6 vertices on a complex shape) produce almost identical training signal to bounding boxes — the polygon simply describes a rough rectangular or hexagonal hull around the object. You pay polygon prices for bounding-box-quality annotations. Overspecified polygons (60+ vertices on a fruit) cost 3–4× more than necessary without improving the downstream model's ability to distinguish the object from its background.
For teams comparing polygon annotation with full semantic segmentation, see our post on semantic segmentation annotation and when you need it — specifically the cost-per-image comparison section.
Need precise polygon annotation for your computer vision dataset?
AI Taggers provides production-scale polygon annotation services with specified vertex budgets per class, inter-annotator IoU QA, COCO JSON / GeoJSON delivery, and AI-assisted pre-labelling for faster turnaround. Projects from 1,000 to 1,000,000+ objects.
Discuss your polygon annotation projectCase Study: Produce Grading AI — From 68% to 86% mAP with Polygon Annotation
In late 2024, an Australian packing house operating automated fresh produce grading lines was experiencing unacceptably high false-grading rates on a machine vision system trained to detect surface defects — bruising, skin breaks, and russeting — on stone fruit. The system had been trained on bounding box annotations and was producing a 14.2% false grading rate: premium fruit was being downgraded, and genuinely defective fruit was occasionally passing to premium packs.
Baseline model performance before annotation rebuild:
- mAP@0.5 across defect classes: 68.3%
- mAP@0.75 (stricter threshold, better diagnostic for shape quality): 41.2%
- False grading rate at operational threshold: 14.2%
- Precision on surface bruising class (largest defect category): 61.8%
- Recall on skin break class (safety-critical): 73.4%
Root cause analysis identified two problems with the bounding box annotations. First, defects on curved fruit surfaces produced boxes that were 40–55% background — the curvature of the fruit meant a rectangular box always captured significant clean-skin area on either side of the defect. Second, some annotators were drawing minimal 4-vertex boxes, while others were drawing tighter 6-vertex boxes — inconsistency in label quality was adding inter-annotator noise on top of the shape problem.
The annotation rebuild covered 22,400 images over seven weeks:
Phase 1 — Guideline rewrite with class-specific vertex budgets (week 1)
Defect classes were divided into four types: bruising (12–16 vertices, large irregular regions), skin break (8–12 vertices, smaller concentrated damage), russeting (16–22 vertices, diffuse surface texture change), and contamination mark (8–12 vertices, compact irregular patches). Visual examples at each vertex count were included in the guideline with pass/fail comparisons. A calibration set of 600 images was annotated by all annotators independently before production began.
Phase 2 — Polygon annotation production (weeks 2–6)
Five annotators trained on fruit defect annotation produced polygon annotations across the 22,400-image dataset. QA sampling ran at 20% of annotations — higher than a standard project — because defect boundary precision was directly linked to grading accuracy. Three annotators required targeted retraining on the russeting class after QA identified systematic polygon under-specification (8-vertex polygons on diffuse texture regions that required 18+).
Phase 3 — Model retrain and evaluation (week 7)
The same model architecture (YOLOv9 fine-tuned for defect detection) was retrained on the polygon-annotated dataset using identical training parameters. Evaluation was run on a held-out test set of 2,400 images annotated under the same polygon protocol by a separate QA annotator.
Results after polygon rebuild and model retrain:
- mAP@0.5: 68.3% → 86.1% — a 17.8 percentage-point improvement
- mAP@0.75: 41.2% → 69.4% — a 28.2 percentage-point improvement (the larger gain reflects tighter boundary quality)
- False grading rate: 14.2% → 3.8% — a 73% reduction at operational threshold
- Precision on bruising class: 61.8% → 84.3%
- Recall on skin break class: 73.4% → 91.7%
The annotation rebuild cost AUD $48,000 (22,400 images at an average of AUD $0.34 per polygon across defect densities), compared to the original bounding box annotation cost of AUD $11,200. The 4.3× cost increase produced a grading false-rate reduction from 14.2% to 3.8%. At the packing house's production volume — approximately 180,000 pieces of fruit per day — the 10.4-percentage-point false grading reduction eliminated misgrading events worth an estimated AUD $320,000 annually. The annotation premium paid back in approximately seven weeks of operation.
For teams evaluating whether this kind of precision upgrade applies to their dataset, see our comparison of bounding box annotation costs and throughput for a cost baseline, and our instance segmentation annotation case study for the next level up in precision.
The Annotation Workflow: Polygon QA at Scale
Polygon annotation quality is measured primarily by Intersection over Union (IoU) between the annotator's polygon and the gold-standard reference annotation. IoU thresholds for production polygon annotation: minimum acceptable IoU of 0.75 for standard detection tasks; 0.85+ for tasks where boundary precision drives model accuracy (defect detection, medical imaging, garment segmentation). Per-class IoU reporting is essential — aggregate IoU across all classes masks systematic underperformance on difficult classes.
Three failure modes account for most polygon annotation quality problems. First: annotators drawing minimal-vertex polygons to meet throughput targets — a 4-vertex polygon on an irregular defect is bounding-box quality at polygon cost. Vertex count checking in the QA pipeline catches this systematically. Second: annotation of occluded regions — annotators must decide whether to annotate the visible boundary, the estimated full boundary, or both (with an occlusion flag). Write the rule explicitly; unwritten convention produces inconsistent results. Third: boundary rounding at high-curvature points — annotators under pressure skip vertices at complex curves, leaving the polygon 'lagging' the true boundary by 10–20 pixels at inflection points.
AI-assisted pre-labelling with SAM 2 or similar foundation models produces usable polygon starting points for simple, well-lit objects. Pre-labelling accuracy on complex defect boundaries and occluded objects typically falls below the 0.75 IoU threshold — for these classes, human annotation without pre-labelling is faster than correcting inaccurate pre-labels. Measure pre-labelling IoU per class on a 500-image pilot before applying it to the full dataset.
For teams considering whether to step up to full pixel-level labelling, our semantic segmentation service covers the full pipeline from taxonomy design to mIoU-validated delivery. For standard object detection tasks that don't require polygon precision, our bounding box annotation service provides volume throughput at lower per-object cost.
Decision Framework: Box, Polygon, or Segmentation?
The label type decision should be based on object shape regularity, downstream model task, and annotation budget — not on default workflow assumptions. The three-way decision framework:
Use bounding boxes when:
Object shape is approximately rectangular; background pixel fraction within the box is <30%; the model task is detection and rough localisation (counting, presence/absence, approximate position); you have >100,000 images and per-image cost is the binding constraint.
Use polygon annotation when:
Object shape is highly irregular; background pixel fraction would exceed 30% in a bounding box; the model task requires shape awareness (grading, quality inspection, garment segmentation, logo detection at rotation); pixel-exact boundaries are not required (i.e., 5–15 pixel boundary errors are acceptable for the downstream task).
Use semantic segmentation when:
Exact pixel-level boundaries matter (organ outlining for radiotherapy, driveable surface for AV, crop-versus-weed maps for robotic weeding); you need to reason about scene areas, not individual object instances; boundary error of even 5 pixels materially affects model output or safety. Expect 3–5× the cost of polygon annotation per image.
Frequently Asked Questions
When is polygon annotation worth the extra cost over bounding boxes?▾
How much does polygon annotation cost compared to bounding boxes?▾
What vertex count should polygon annotations use?▾
When should I use instance segmentation instead of polygon annotation?▾
What QA metrics apply to polygon annotation projects?▾
Can AI pre-labelling be used for polygon annotation?▾
Get Accurate Polygon Annotation for Your Computer Vision Project
Tell us your object classes, image volume, vertex requirements, and target IoU — we'll scope a calibrated polygon annotation project within 48 hours.
Neel Bennett
AI Annotation Specialist at AI Taggers
Neel has over 8 years of experience in AI training data and machine learning operations. He specializes in helping enterprises build high-quality datasets for computer vision and NLP applications across healthcare, automotive, and retail industries.
Connect on LinkedIn