E-commerce AI is no longer a "nice-to-have" layer. It is the difference between Amazon's 2024 recommendation revenue ($30B+) and a Shopify store that converts 1.2%. The gap is rarely the model architecture — it is the training data behind it. And training data behind e-commerce AI is annotation.
This guide walks through the annotation categories that drive real e-commerce AI value. We cover product image tagging, attribute extraction, review intelligence, search query understanding, visual search — and what changes when you're building for MENA marketplaces like Noon, Salla, Zid, Jumia and similar regional players.
Product Image Tagging: The Foundation
Product imagery drives discovery. Every product image needs structured tags so the search index and recommendation system can find it. The standard tag set:
- Category hierarchy (women's > clothing > tops > t-shirts)
- Primary attributes (colour, material, pattern, occasion)
- Secondary attributes (fit, sleeve length, neckline, season)
- Quality flags (model worn vs flat lay, background type, image quality issues)
- Brand and sub-brand identification
- Compliance flags (restricted content, age-gating)
The deepest competitive moat in catalog AI is the attribute taxonomy itself. Generic taxonomies underperform — fashion needs different attributes from electronics needs different attributes from home goods. Spend the time to design your taxonomy before scaling annotation. Re-doing 500K product tags because the taxonomy was wrong is the most expensive mistake in catalog AI.
Product Attribute Extraction From Text
Most catalogs have rich textual descriptions that contain attribute information not present in images (sleeve material, country of origin, washing instructions). Attribute extraction is the NLP equivalent of image tagging.
Best practice: annotate text-extracted attributes with the same taxonomy as image attributes, then reconcile downstream. Conflicts (text says "blue", image looks teal) often reveal data quality issues worth catching. For multilingual catalogs, attribute extraction has to happen per language — Arabic descriptions of the same product surface different attribute mentions than English ones.
Review Sentiment & Aspect-Based Analysis
Product reviews are a goldmine for AI applications: review summarisation, aspect-based sentiment, fake review detection, and feedback loops to merchandising teams. The annotation work:
- Overall sentiment (positive / negative / neutral / mixed)
- Aspect-based sentiment (quality: positive, sizing: negative, value: positive)
- Authenticity signals (genuine vs paid vs fake)
- Actionable feedback flags (defect report, sizing issue, shipping complaint)
- Sarcasm and irony detection (especially important in Arabic and Egyptian reviews where sarcastic praise is common)
Search Query Understanding
User search queries are typically short, ambiguous, multilingual, and full of typos. Annotation makes sense of them.
- Intent classification (browse, specific product, comparison, problem-solving)
- Entity extraction (brand, category, attribute mention)
- Language and dialect detection in code-switched queries
- Query rewriting annotation for typo correction
- Personalisation signals (price-sensitive query vs premium query)
Visual Search & Lifestyle Imagery
Visual search lets customers upload a photo and find similar products. The annotation work splits into two:
- Catalog-side: attribute tagging (covered above) plus visual feature extraction
- Lifestyle-side: bounding box around each product in a scene, plus attribute extraction per product. Often combined with influencer-content moderation labeling.
What Changes for MENA E-commerce
MENA marketplaces have distinct annotation requirements that generic English vendors miss:
- Arabic attribute vocabulary — terms like "abaya cut" or "thawb collar" don't exist in English fashion taxonomies but matter to Saudi customers.
- Dialect-aware review sentiment — Khaleeji praise reads differently from Egyptian. Cross-dialect models trained on one corpus underperform.
- Arabic-English code-switched search queries — "iPhone case أحمر" (red iPhone case) is a single query, not two.
- RTL-native cataloguing — descriptions, attribute lists, and search highlight rendering must work in RTL contexts without breaking layout.
- Cultural appropriateness flags — content moderation taxonomies need MENA-specific category awareness.
E-commerce annotation for your catalog
Free 25-50 product pilot in 48 hours. Image tagging, attribute extraction, review sentiment. Multilingual including Arabic, Khaleeji and Egyptian.
Get Free SampleRelated Reading
- → Retail & E-commerce industry page
- → Arabic text annotation for MENA e-commerce
- → Khaleeji vs MSA: dialect strategy for product search
- → Image annotation services
Test our e-commerce annotation
Send 25-50 products or reviews. Free annotated sample in 24-48 hours. Multilingual supported.
Neel Bennett
AI Annotation Specialist at AI Taggers
Neel has over 8 years of experience in AI training data and machine learning operations. He specializes in helping enterprises build high-quality datasets for computer vision and NLP applications across healthcare, automotive, and retail industries.
Connect on LinkedIn