Quick answer
Bounding box annotation is the process of drawing rectangular labels around objects in images for object detection training. Cost ranges from AUD $0.05 per box for simple, well-separated single-class objects to AUD $0.80 per box for dense, occluded scenes with strict QA requirements. Throughput ranges from 300 images per annotator per day (complex scenes) to 1,200 images per day (simple with model-assisted pre-labelling). The single biggest cost driver is object density per image, not the annotation platform or the number of label classes.
What Drives Bounding Box Annotation Costs: The Five Variables
Bounding box annotation pricing is quoted in two ways: per image or per box. Per-image pricing creates opaque comparisons across datasets with different object densities; per-box pricing is more transparent for planning. The five factors that move price up or down are:
1. Object density per image
A retail product image with 2 clearly separated items takes 45–60 seconds to annotate. A dense street scene with 20 pedestrians, 8 vehicles, and 15 traffic elements takes 8–12 minutes. Density is the dominant cost variable — more than label complexity, QA requirements, or platform choice.
2. Degree of occlusion and truncation
Partially obscured objects require annotator judgement about box extent. Clear guidelines for occlusion handling add 20–30% to annotation time on affected images. Without clear guidelines, annotators make inconsistent decisions — creating systematic label errors that are expensive to find and fix later.
3. Label taxonomy size and attribute richness
Drawing a box takes 10–15 seconds. Selecting from a flat list of 5 classes adds 3–5 seconds. Filling in 4–6 structured attributes (orientation, truncation level, confidence) adds 15–25 seconds per box. Rich taxonomies multiply annotation time faster than most project managers expect.
4. QA intensity
Gold standard injection, double-blind second review, and IAA auditing each add cost on top of base annotation. A 5% gold-set injection rate and 10% second-review sampling adds roughly 15% to total project cost — but prevents the 25–40% rework rates common in projects that skip QA controls.
5. Model-assisted pre-labelling availability
If you have an existing detection model with 60%+ precision on your target classes, human annotators can review and correct pre-labels rather than drawing from scratch. This reduces annotation time by 35–55% for simpler scenes. For novel visual categories, pre-labelling helps less than vendors typically claim.
Realistic Pricing by Scene Type (2026 AUD Rates)
Industry data from annotation projects completed in 2025–2026 shows the following realistic per-box ranges in Australian dollars. These reflect production-quality annotation with QA controls — not crowdsourced spot rates that exclude rework cost:
| Scene Type | Cost/Box (AUD) | Images/Day (per annotator) |
|---|---|---|
| Retail products, 1–3 objects, 1–2 classes | $0.05–$0.12 | 900–1,200 |
| Indoor/warehouse, 4–8 objects, 4–6 classes | $0.12–$0.25 | 500–750 |
| Urban street scene, 15–25 objects, 8–12 classes | $0.20–$0.45 | 280–420 |
| Aerial / satellite imagery, variable density | $0.30–$0.60 | 150–250 |
| Medical imaging, radiologist review required | $0.50–$0.80+ | 80–130 |
According to a 2025 industry survey by Surge AI, 61% of computer vision datasets use bounding boxes as their primary label type — making it the most common annotation task in production ML. Yet cost benchmarks for bounding box annotation remain poorly documented, which is why per-project pricing surprises are so frequent.
Throughput Planning: From Single Annotator to Team Scale
Throughput planning requires more than multiplying images per annotator per day by headcount. Real annotation teams lose 15–25% of theoretical capacity to calibration sessions, edge case reviews, guideline revisions, and QA feedback loops. For a 100,000-image project:
- 4 annotators, moderate complexity: 8–10 weeks (including 15% buffer for QA and rework)
- 10 annotators, moderate complexity: 3–4 weeks
- 10 annotators + model-assisted pre-labelling: 2–2.5 weeks
- 4 annotators, dense outdoor scenes: 14–18 weeks
Teams that skip the IAA calibration phase — where annotators work in parallel on the same 200-image test set and discuss disagreements — typically see 20–30% higher rework rates because systematic misinterpretations of edge cases are not caught until late in the project. The calibration session adds 2–3 days at the start but saves weeks at the end.
Case Study: 2.3 Million Boxes in 45 Days for an Australian Retail AI Team
In late 2025, an Australian fashion and homewares retailer needed a product detection dataset for a mobile visual search feature. The scope: 890,000 product images from an existing catalogue, spanning 47 product categories. The annotation task was single-object primary bounding box per image, with a secondary bounding box for detail regions (labels, fastenings) on apparel items.
At an average of 2.6 boxes per image, the total label count was approximately 2.3 million bounding boxes. The project had a hard 45-day deadline ahead of a peak season app release.
The annotation approach:
- Days 1–3: Taxonomy review, 500-image calibration set, IAA baseline (Cohen's kappa 0.87 on IoU ≥ 0.70)
- Days 4–10: Phase 1 — 120,000 images annotated fully manually to train internal detection model
- Days 11–45: Phase 2 — model-assisted pre-labelling at 73% precision, human review and correction
- Ongoing: 5% gold-set injection, 10% second-pass QA review
Phase 2 throughput with model-assisted pre-labelling: 51,000 images per day across a 14-annotator team (3,640 images/annotator/day, roughly 3x the from-scratch baseline for this product category). Final dataset metrics:
- IAA (final): Cohen's kappa 0.89, average IoU 0.83
- Error rate on gold-set checks: 1.4% (below the 3% threshold).
- Cost: AUD $0.12/box for apparel and homewares, AUD $0.22/box for accessories with detail-region labels
- Total project cost: approximately AUD $312,000 including QA and platform
The downstream visual search model achieved 91.3% top-5 retrieval accuracy on the held-out test set — compared with 73.8% on an earlier model trained from a smaller, manually annotated set with no IAA controls. The 45-day deadline was met by 2 days. The previous attempt with a crowdsourcing platform had produced a dataset with an 18% error rate that required near-complete rework.
Need high-volume bounding box annotation for a tight deadline?
AI Taggers provides scalable bounding box annotation with model-assisted pre-labelling, IAA-controlled QA, and transparent per-box pricing. We quote within 24 hours.
Get a quoteWhen Bounding Boxes Are the Right Label Type — and When They Are Not
Bounding boxes are the correct label type when your model needs to localise objects for detection or tracking and the rectangular approximation does not introduce meaningful training noise. This covers the majority of object detection use cases: vehicle detection, pedestrian tracking, product localisation, and document element extraction.
Bounding boxes become the wrong choice when:
- Objects are highly non-rectangular (trees, medical lesions, coastlines) and the background pixels inside the box confuse the feature extractor
- Objects are densely packed and overlapping bounding boxes create ambiguous positive regions (retail shelves, cell microscopy)
- The task requires pixel-level precision for segmentation, instance counting, or edge-detection training
For these cases, polygon annotation or semantic segmentation is worth the 3–5x cost premium. Read our post on product tagging and visual search annotation for a worked example of when to upgrade label types in a retail context.
Quality Controls That Prevent Expensive Rework
The most common bounding box annotation failure mode is not inaccurate boxes — it is inconsistent boxes. Two annotators working to the same guidelines will disagree on where to place the box edge on a partially occluded object, whether a truncated object counts as annotatable, and how to handle objects smaller than the minimum pixel threshold. Without QA controls to catch these disagreements early, they compound across hundreds of thousands of images.
The minimum viable QA stack for a bounding box project:
- Calibration set: 200–500 images labelled in parallel by all annotators before production starts. Compute IoU agreement and discuss disagreements before scale-up.
- Gold standard injection: 3–5% of production tasks are pre-labelled by an expert and mixed into the annotation queue. Annotators who score below 0.75 IoU on gold tasks are flagged for retraining.
- Second-pass sampling: 10% of completed tasks reviewed by a QA lead using a consistent checklist (box tightness, truncation handling, attribute completeness).
- Error taxonomy: Track errors by type — missed objects, incorrect class, loose box, tight box, wrong attribute — to identify guideline gaps rather than individual performance issues.
Projects that use these controls see rework rates of 2–5%. Projects without them average 15–30% rework — turning a seemingly cheap per-box rate into a significantly more expensive total project cost. See our guide on annotation QA and relabeling for a detailed breakdown of how systematic quality failures are diagnosed and corrected.
How to Get an Accurate Quote for Your Bounding Box Project
When requesting a quote from an annotation services provider, provide the following information to get a number you can rely on:
- Sample images: 50–200 representative images covering your range of scene complexity and edge cases
- Annotation schema: Label classes, attributes, and rules for occlusion/truncation handling
- Volume and deadline: Total image count and target delivery date
- Quality requirements: IoU threshold, IAA target, acceptable error rate
- Existing model: Whether you have a detection model available for pre-labelling
Vendors who quote without seeing sample images are guessing. Vendors who provide per-image pricing without clarifying object density assumptions are quoting the easy cases. A reliable quote requires a 24-hour sample annotation pilot on your actual data — which any serious annotation partner should offer at no cost. For related annotation types at scale, see also our 3D cuboid annotation guide for AV perception projects where bounding boxes extend into 3D space.
Frequently Asked Questions
How much does bounding box annotation cost per image?▾
How fast can annotators label images with bounding boxes?▾
Does model-assisted annotation reduce bounding box costs?▾
When should I upgrade from bounding boxes to polygon annotation?▾
What IoU threshold should I use for bounding box quality control?▾
How many annotators do I need for 100,000 images?▾
Get a Quote for Your Bounding Box Annotation Project
Send us 50 sample images and your annotation schema — we'll return an annotated sample and a fixed per-box price within 24 hours.
Neel Bennett
AI Annotation Specialist at AI Taggers
Neel has over 8 years of experience in AI training data and machine learning operations. He specializes in helping enterprises build high-quality datasets for computer vision and NLP applications across healthcare, automotive, and retail industries.
Connect on LinkedIn