When should I use polygon annotation instead of bounding boxes?

Use polygon annotation when the false positive area of a bounding box matters to your model. Bounding boxes include background pixels around irregular objects, which can confuse feature extractors for densely packed scenes or irregular shapes. Polygon annotation is worth the 3–5x cost premium for: object counting in densely packed retail shelves, clothing segmentation for virtual try-on, and any task where overlapping bounding boxes would produce ambiguous training signal. For standard detection tasks — vehicles, pedestrians, parcels — bounding boxes are sufficient.

Bounding Box Annotation Cost and Throughput Guide (2026)

Quick answer

Bounding box annotation is the process of drawing rectangular labels around objects in images for object detection training. Cost ranges from AUD $0.05 per box for simple, well-separated single-class objects to AUD $0.80 per box for dense, occluded scenes with strict QA requirements. Throughput ranges from 300 images per annotator per day (complex scenes) to 1,200 images per day (simple with model-assisted pre-labelling). The single biggest cost driver is object density per image, not the annotation platform or the number of label classes.

What Drives Bounding Box Annotation Costs: The Five Variables

Bounding box annotation pricing is quoted in two ways: per image or per box. Per-image pricing creates opaque comparisons across datasets with different object densities; per-box pricing is more transparent for planning. The five factors that move price up or down are:

1. Object density per image

A retail product image with 2 clearly separated items takes 45–60 seconds to annotate. A dense street scene with 20 pedestrians, 8 vehicles, and 15 traffic elements takes 8–12 minutes. Density is the dominant cost variable — more than label complexity, QA requirements, or platform choice.

2. Degree of occlusion and truncation

Partially obscured objects require annotator judgement about box extent. Clear guidelines for occlusion handling add 20–30% to annotation time on affected images. Without clear guidelines, annotators make inconsistent decisions — creating systematic label errors that are expensive to find and fix later.

3. Label taxonomy size and attribute richness

Drawing a box takes 10–15 seconds. Selecting from a flat list of 5 classes adds 3–5 seconds. Filling in 4–6 structured attributes (orientation, truncation level, confidence) adds 15–25 seconds per box. Rich taxonomies multiply annotation time faster than most project managers expect.

4. QA intensity

Gold standard injection, double-blind second review, and IAA auditing each add cost on top of base annotation. A 5% gold-set injection rate and 10% second-review sampling adds roughly 15% to total project cost — but prevents the 25–40% rework rates common in projects that skip QA controls.

5. Model-assisted pre-labelling availability

If you have an existing detection model with 60%+ precision on your target classes, human annotators can review and correct pre-labels rather than drawing from scratch. This reduces annotation time by 35–55% for simpler scenes. For novel visual categories, pre-labelling helps less than vendors typically claim.

Realistic Pricing by Scene Type (2026 AUD Rates)

Industry data from annotation projects completed in 2025–2026 shows the following realistic per-box ranges in Australian dollars. These reflect production-quality annotation with QA controls — not crowdsourced spot rates that exclude rework cost:

Scene Type	Cost/Box (AUD)	Images/Day (per annotator)
Retail products, 1–3 objects, 1–2 classes	$0.05–$0.12	900–1,200
Indoor/warehouse, 4–8 objects, 4–6 classes	$0.12–$0.25	500–750
Urban street scene, 15–25 objects, 8–12 classes	$0.20–$0.45	280–420
Aerial / satellite imagery, variable density	$0.30–$0.60	150–250
Medical imaging, radiologist review required	$0.50–$0.80+	80–130

According to a 2025 industry survey by Surge AI, 61% of computer vision datasets use bounding boxes as their primary label type — making it the most common annotation task in production ML. Yet cost benchmarks for bounding box annotation remain poorly documented, which is why per-project pricing surprises are so frequent.

Throughput Planning: From Single Annotator to Team Scale

Throughput planning requires more than multiplying images per annotator per day by headcount. Real annotation teams lose 15–25% of theoretical capacity to calibration sessions, edge case reviews, guideline revisions, and QA feedback loops. For a 100,000-image project:

4 annotators, moderate complexity: 8–10 weeks (including 15% buffer for QA and rework)
10 annotators, moderate complexity: 3–4 weeks
10 annotators + model-assisted pre-labelling: 2–2.5 weeks
4 annotators, dense outdoor scenes: 14–18 weeks

Teams that skip the IAA calibration phase — where annotators work in parallel on the same 200-image test set and discuss disagreements — typically see 20–30% higher rework rates because systematic misinterpretations of edge cases are not caught until late in the project. The calibration session adds 2–3 days at the start but saves weeks at the end.

Case Study: 2.3 Million Boxes in 45 Days for an Australian Retail AI Team

In late 2025, an Australian fashion and homewares retailer needed a product detection dataset for a mobile visual search feature. The scope: 890,000 product images from an existing catalogue, spanning 47 product categories. The annotation task was single-object primary bounding box per image, with a secondary bounding box for detail regions (labels, fastenings) on apparel items.

At an average of 2.6 boxes per image, the total label count was approximately 2.3 million bounding boxes. The project had a hard 45-day deadline ahead of a peak season app release.

The annotation approach:

Days 1–3: Taxonomy review, 500-image calibration set, IAA baseline (Cohen's kappa 0.87 on IoU ≥ 0.70)
Days 4–10: Phase 1 — 120,000 images annotated fully manually to train internal detection model
Days 11–45: Phase 2 — model-assisted pre-labelling at 73% precision, human review and correction
Ongoing: 5% gold-set injection, 10% second-pass QA review

Phase 2 throughput with model-assisted pre-labelling: 51,000 images per day across a 14-annotator team (3,640 images/annotator/day, roughly 3x the from-scratch baseline for this product category). Final dataset metrics:

IAA (final): Cohen's kappa 0.89, average IoU 0.83
Error rate on gold-set checks: 1.4% (below the 3% threshold).
Cost: AUD $0.12/box for apparel and homewares, AUD $0.22/box for accessories with detail-region labels
Total project cost: approximately AUD $312,000 including QA and platform

The downstream visual search model achieved 91.3% top-5 retrieval accuracy on the held-out test set — compared with 73.8% on an earlier model trained from a smaller, manually annotated set with no IAA controls. The 45-day deadline was met by 2 days. The previous attempt with a crowdsourcing platform had produced a dataset with an 18% error rate that required near-complete rework.

Need high-volume bounding box annotation for a tight deadline?

AI Taggers provides scalable bounding box annotation with model-assisted pre-labelling, IAA-controlled QA, and transparent per-box pricing. We quote within 24 hours.

Get a quote

When Bounding Boxes Are the Right Label Type — and When They Are Not

Bounding boxes are the correct label type when your model needs to localise objects for detection or tracking and the rectangular approximation does not introduce meaningful training noise. This covers the majority of object detection use cases: vehicle detection, pedestrian tracking, product localisation, and document element extraction.

Bounding boxes become the wrong choice when:

Objects are highly non-rectangular (trees, medical lesions, coastlines) and the background pixels inside the box confuse the feature extractor
Objects are densely packed and overlapping bounding boxes create ambiguous positive regions (retail shelves, cell microscopy)
The task requires pixel-level precision for segmentation, instance counting, or edge-detection training

For these cases, polygon annotation or semantic segmentation is worth the 3–5x cost premium. Read our post on product tagging and visual search annotation for a worked example of when to upgrade label types in a retail context.

Quality Controls That Prevent Expensive Rework

The most common bounding box annotation failure mode is not inaccurate boxes — it is inconsistent boxes. Two annotators working to the same guidelines will disagree on where to place the box edge on a partially occluded object, whether a truncated object counts as annotatable, and how to handle objects smaller than the minimum pixel threshold. Without QA controls to catch these disagreements early, they compound across hundreds of thousands of images.

The minimum viable QA stack for a bounding box project:

Calibration set: 200–500 images labelled in parallel by all annotators before production starts. Compute IoU agreement and discuss disagreements before scale-up.
Gold standard injection: 3–5% of production tasks are pre-labelled by an expert and mixed into the annotation queue. Annotators who score below 0.75 IoU on gold tasks are flagged for retraining.
Second-pass sampling: 10% of completed tasks reviewed by a QA lead using a consistent checklist (box tightness, truncation handling, attribute completeness).
Error taxonomy: Track errors by type — missed objects, incorrect class, loose box, tight box, wrong attribute — to identify guideline gaps rather than individual performance issues.

Projects that use these controls see rework rates of 2–5%. Projects without them average 15–30% rework — turning a seemingly cheap per-box rate into a significantly more expensive total project cost. See our guide on annotation QA and relabeling for a detailed breakdown of how systematic quality failures are diagnosed and corrected.

How to Get an Accurate Quote for Your Bounding Box Project

When requesting a quote from an annotation services provider, provide the following information to get a number you can rely on:

Sample images: 50–200 representative images covering your range of scene complexity and edge cases
Annotation schema: Label classes, attributes, and rules for occlusion/truncation handling
Volume and deadline: Total image count and target delivery date
Quality requirements: IoU threshold, IAA target, acceptable error rate
Existing model: Whether you have a detection model available for pre-labelling

Vendors who quote without seeing sample images are guessing. Vendors who provide per-image pricing without clarifying object density assumptions are quoting the easy cases. A reliable quote requires a 24-hour sample annotation pilot on your actual data — which any serious annotation partner should offer at no cost. For related annotation types at scale, see also our 3D cuboid annotation guide for AV perception projects where bounding boxes extend into 3D space.

Frequently Asked Questions

How much does bounding box annotation cost per image?▾

Cost per image depends on object density and complexity. Simple retail product images (1–3 objects) cost AUD $0.08–$0.20 per image. Dense urban scenes (15–25 objects, 8–10 classes) cost AUD $0.60–$1.50 per image. Medical imaging with specialist review runs AUD $1.50–$4.00 per image.

How fast can annotators label images with bounding boxes?▾

Simple scenes: 900–1,200 images per annotator per day. Moderate scenes: 500–750/day. Dense outdoor or aerial scenes: 200–350/day. Model-assisted pre-labelling roughly doubles throughput for simpler categories once a pre-trained detection model is available.

Does model-assisted annotation reduce bounding box costs?▾

Yes — but only after you have 5,000–15,000 manually annotated examples to train a useful pre-labelling model. For novel visual categories, human annotation from scratch is often faster than correcting poor pre-labels. Once a model reaches 60%+ precision, model-assist reduces annotation time by 35–55%.

When should I upgrade from bounding boxes to polygon annotation?▾

Upgrade to polygons when object shape is highly non-rectangular and background pixels within bounding boxes confuse your model, or when objects are densely packed and overlapping boxes create ambiguous training signal. For standard detection tasks — vehicles, pedestrians, packaged goods — bounding boxes are sufficient.

What IoU threshold should I use for bounding box quality control?▾

Production annotation QA typically targets IoU ≥ 0.70 for inter-annotator agreement. Safety-critical applications (medical, AV perception) should target 0.75–0.85. Use IoU as a consensus metric between at least two annotators on a 5–10% sample, not just as a post-hoc evaluation metric.

How many annotators do I need for 100,000 images?▾

For moderate complexity (5–10 boxes/image, 4–6 classes): 4 annotators take 8–10 weeks; 10 annotators take 3–4 weeks. Add model-assisted pre-labelling to reduce this by 30–40%. Always add a 15–20% buffer for calibration, guidelines revision, and QA rework.

Free Sample · 24-48 hours

Get a Quote for Your Bounding Box Annotation Project

Send us 50 sample images and your annotation schema — we'll return an annotated sample and a fixed per-box price within 24 hours.

Neel Bennett

AI Annotation Specialist at AI Taggers

Neel has over 8 years of experience in AI training data and machine learning operations. He specializes in helping enterprises build high-quality datasets for computer vision and NLP applications across healthcare, automotive, and retail industries.

Connect on LinkedIn

How Much Does Bounding Box Annotation Cost and How Fast Can It Scale?