FDA 21 CFR Part 11 for Annotation: What Your Provenance Logs Need to Include

21 CFR Part 11 — the FDA's Electronic Records; Electronic Signatures rule — has been in force since 1997, but its application to AI/ML training data is recent enough that many medical AI teams have not internalised it. The rule governs any electronic record that is required or referenced under FDA regulations. For Software as a Medical Device (SaMD) submissions, the annotation dataset that trained and validated the algorithm is a design record under 21 CFR Part 820's Quality System Regulation (QSR). That makes Part 11 apply to how the annotation data was created, modified, approved, and stored — not just to the final software product.

The implications run through the entire annotation workflow: platform selection, annotator credentialing, QA protocols, adjudication sign-off, version control, and data retention. Teams that start building these records after the annotation project is complete — when FDA review is approaching — almost always end up with incomplete provenance that fails the inspection standard. The correct approach is to instrument the annotation workflow for Part 11 compliance before the first label is created.

Why Part 11 Applies to Annotation Data — and Where Teams Get Confused

The confusion arises from a category error: teams treat Part 11 as a software validation standard (it is partly that) and miss its record integrity dimension. The rule has two operative parts. Subpart B sets technical requirements for electronic records: audit trails, system access controls, operational system checks, and authority checks. Subpart C sets standards for electronic signatures: unique identifying components, the link between signature and record, and user authentication protocols.

For annotation workflows, the records in scope are the annotated dataset itself plus all records that document how that dataset was produced. This includes: per-annotation event logs showing who created or modified each label and when; quality review records showing which annotations were reviewed, by whom, and with what outcome; adjudication records where multiple annotators disagreed; annotator credential and training records; gold standard validation results; and the annotation guidelines version that was in force at each stage of the project.

The reason teams are surprised is that annotation platforms are not usually sold as regulated systems. Labelbox, Scale AI's platform, and AWS GroundTruth were designed for commercial ML workflows. They generate logs, but those logs are not structured as Part 11-compliant electronic records unless the implementation is specifically engineered for that purpose. Clinical expert annotation programmes designed for regulatory submission require a fundamentally different platform and workflow architecture than research-grade labelling runs.

The Five Core Requirements — Applied to Annotation

Part 11's technical requirements reduce to five functional categories. Here is what each one means in an annotation context.

1. Audit Trail: Event-Level, Immutable, and Timestamped

Part 11 §11.10(e) requires that computer-generated, non-alterable audit trails document the date and time of operator entries and actions that create, modify, or delete electronic records. For annotation, this means every label creation event, every modification to an existing label, every deletion, every QA flag, every reviewer decision, and every adjudication action must be logged at the individual event level.

Common failures in annotation audit trails:

Batch-level logging. The system records that annotator A completed 200 images on a given date, but does not record which specific annotation on which specific image was created or modified at which specific time. This does not satisfy Part 11 — individual record attribution is required.
Mutable logs. The audit trail is stored in the same database as the annotation data, with the same write permissions. An administrator can alter or delete log entries. Part 11 requires audit trails to be stored independently with write-protection — separate from the records they document and not accessible for modification by ordinary system users.
Missing previous values on modification. When an annotator changes a bounding box coordinate or reclassifies a lesion, the audit trail records the new value but not the value that was overwritten. Part 11 requires that record modifications include the previous value so that the complete change history can be reconstructed.
Session-level rather than action-level timestamps. The system logs when a user logged in and out but does not attribute individual annotation actions to specific times within that session. Action-level timestamps with at least second-level resolution are required.

Platforms that natively generate compliant audit trails include MD.ai (radiology), Flywheel (multi-modal medical imaging), and Proscia Concentriq (digital pathology). Each ships with validation documentation that maps their audit trail architecture to Part 11 requirements. Purpose-built platforms eliminate the audit trail engineering work that open-source tooling requires.

2. Electronic Signatures: Linking Identity to Specific Records

Part 11 §11.50 requires that electronic signatures be linked to their respective electronic records in a manner that cannot be excised, copied, or otherwise transferred to falsify an electronic record by ordinary means. For annotation workflows, this means that a board-certified radiologist's sign-off on a CT annotation batch cannot be captured as a PDF signature or a checkbox in an email thread. The signature must be programmatically linked to the specific annotation version it approves, stored with the annotation record, and non-transferable.

The practical implication: adjudication sign-offs must occur within the annotation platform, not via an external approval process. If your radiologists review annotation exports in a DICOM viewer and approve via email, those approvals are not Part 11-compliant electronic signatures regardless of how they are later documented. The approved annotation version and the approval record must be co-located in the platform's record system.

For programmes using radiology annotation or histopathology annotation, this requirement drives platform selection before annotators are onboarded. Retrofitting electronic signature infrastructure into a completed annotation project is technically feasible but the results typically fail to satisfy Part 11's non-transferability requirement — the signatures can be recreated but cannot be demonstrated to have been contemporaneous with the annotation events they purport to approve.

3. Access Controls: System-Enforced Role Separation

Part 11 §11.10(d) requires system access to be limited to authorised individuals. For annotation, this requires role-based access control enforced by the platform — not by convention. The roles that must be system-separated are: annotator (can create and modify labels within assigned cases), reviewer/QA (can flag, reject, and send back but not unilaterally modify annotations), adjudicator (can resolve conflicts and create gold standard records), and administrator (can manage user accounts but should not have access to modify annotation data).

Teams regularly implement a single annotator role that has full write access to annotation records including historical versions, and a “QA role” that is just a different login with the same access permissions. This does not satisfy Part 11. The access control requirement is functional, not nominal — the system must enforce that a reviewer cannot overwrite an annotator's original record, and that an administrator cannot alter annotation data without an audit trail event being generated.

Annotator access to specific cases must also be scoped to the assigned workload. Annotators should not be able to browse unadjudicated annotations from other annotators — this prevents annotation contamination and satisfies Part 11's operational check requirements under §11.10(f), which require that only authorised individuals can use the system to perform specific functions.

4. System Validation: IQ, OQ, PQ Documentation

Part 11 §11.10(a) requires that systems be validated to ensure accuracy, reliability, consistent intended performance, and the ability to discern invalid or altered records. The standard approach to demonstrating system validation in FDA submissions is the IQ/OQ/PQ framework:

Installation Qualification (IQ). Documents that the annotation platform was installed correctly in the environment where it will be used for the submission programme. Covers hardware specifications, software version control, network environment, and confirmation that the installed system matches the validated configuration.
Operational Qualification (OQ). Demonstrates that the system operates according to its specifications under expected conditions. For an annotation platform, this means executing test scripts that verify: audit trail capture is complete and non-alterable, electronic signature workflows function as designed, role-based access controls prevent unauthorised actions, and the system produces consistent annotation exports.
Performance Qualification (PQ). Demonstrates that the system performs reliably in the actual production annotation environment over time. PQ testing runs against representative medical data samples and confirms that the annotation workflow produces complete, attributable provenance records under real operating conditions.

Commercial medical annotation platforms including Flywheel, MD.ai, and Aperio eSlide Manager (Leica Biosystems) publish pre-qualified IQ/OQ documentation as part of their regulated-market product offerings. These vendor-provided validation packages are reviewed and accepted by FDA reviewers, significantly reducing the validation burden on the annotation team.

Teams using open-source platforms (Label Studio, CVAT, OHIF) must generate their own IQ/OQ/PQ documentation from scratch — a process that typically requires 40–120 hours of validation engineering depending on the platform and deployment configuration. The engineering cost alone usually exceeds the annual licence cost of a validated commercial platform.

5. Backup, Recovery, and Record Retention

Part 11 §11.10(c) requires that accurate and complete copies of records can be retrieved throughout the retention period. For annotation provenance records, this means defining a recovery point objective (RPO) and a recovery time objective (RTO) before the programme starts, and testing the backup recovery procedure — not just executing it.

Under 21 CFR Part 820.180, design records (which include annotation datasets and their provenance documentation) must be retained for a period not less than two years from the date the device is released for distribution. For active commercial SaMD products, this effectively means the provenance records must remain accessible for the commercial life of the product plus two years. Annotation vendors must provide clients with a complete, self-contained provenance export at project close — including audit trail logs, signature records, annotator credentials, and guidelines version history — stored in a format and location outside the annotation platform, because platform continuity cannot be guaranteed over a multi-year retention period.

The export format matters. Provenance records exported as flat CSV files or platform-specific JSON without schema documentation are difficult to interpret years later. Structured exports with a clear schema, human-readable annotation event descriptions, and DICOM or HL7 FHIR compatibility (for health data) are significantly easier to present in an FDA audit or post-market surveillance review.

How Part 11 Requirements Vary Across Medical Imaging Modalities

The five core requirements apply uniformly, but their practical implementation differs by modality because the annotation task complexity, the volume of records generated, and the expert review structure all vary.

Radiology (CT, MRI, X-ray): Multi-sequence studies mean a single patient case can generate hundreds of individual annotation events across slice-level segmentation, landmark placement, and classification labels. Audit trail volume is high, and the audit trail storage architecture needs to handle large event counts without degrading platform performance. Per-study sign-off by board-certified radiologists is the standard adjudication model — the signature must link to the specific study version, not just the batch. See our radiology AI annotation guide for modality-specific workflow design.

Digital pathology (WSI): Whole-slide images are billions of pixels. Tile-level annotation creates massive audit trail volumes, and per-slide sign-off by credentialed pathologists is the expected adjudication standard. The complexity of WSI platforms — where a pathologist can view and annotate a slide across multiple sessions — means session resumption must be captured in the audit trail without creating ambiguity about which annotations belong to which review event.

Clinical document annotation (NLP): Clinical document annotation for NLP models — diagnosis coding, clinical NER, de-identification — generates annotation events at the entity or sentence level, often with much higher volume per document than imaging tasks. Part 11 audit trails for NLP annotation must capture the specific text span that was annotated, not just the document ID, so that the provenance record is interpretable without re-accessing the original clinical record.

Ophthalmology (fundus, OCT): Retinal grading scales like the ICDR (International Clinical Diabetic Retinopathy scale) involve structured categorical labels. The audit trail must capture the specific grading scale version that was in force at the time of annotation, because grading criteria do evolve and the regulatory record must demonstrate that the dataset was labelled under a defined, version-controlled protocol.

The SaMD Context: Part 11 Within the FDA's AI/ML Action Plan

In 2021, the FDA published its AI/ML-Based Software as a Medical Device Action Plan, which introduced the concept of Predetermined Change Control Plans (PCCPs). A PCCP allows SaMD developers to specify in advance the types of algorithm updates they may make post-clearance — including model retraining — without requiring a new 510(k) submission for each update, provided the changes fall within the predetermined scope.

PCCPs dramatically increase the importance of annotation provenance. If a team plans to retrain their cleared algorithm on new data and invoke a PCCP, they must demonstrate that the new training data was produced under a quality system equivalent to the original. That means the annotation provenance for the original training dataset must be sufficiently complete to serve as a quality baseline — and the provenance for subsequent retraining datasets must demonstrate consistency with that baseline. Incomplete provenance for the original training run undermines the PCCP's credibility even if the retraining data is perfectly documented.

The FDA's 2022 guidance on Clinical Decision Support Software (which revised the de-regulation of lower-risk CDS) and the 2023 draft guidance on AI/ML-enabled device software functions both reinforce the expectation that training data provenance will be a standard component of device submissions. Teams building medical AI in 2026 should treat Part 11-compliant annotation documentation as a baseline submission requirement, not an edge case for high-risk devices only. The histopathology whole-slide annotation guide covers how these documentation standards apply specifically to WSI programmes; the build vs buy annotation framework addresses how compliance infrastructure affects the true cost of in-house vs outsourced annotation.

The Provenance Records Checklist: What to Verify Before Annotation Starts

Use this checklist before committing to an annotation platform and workflow for a regulatory submission programme. Gaps discovered after annotation is complete are expensive to remediate and sometimes impossible to recover.

Audit trail granularity. Does the platform generate event-level audit trail entries for every annotation creation, modification, and deletion — including the previous value when a label is changed? Can the audit trail be exported independently of the annotation data, in a non-alterable format?
Electronic signature implementation. Are adjudication sign-offs captured within the platform as programmatic signatures linked to specific annotation records and versions? Is there a defined user identity linked to each signature, stored with the record?
Role-based access enforcement. Are annotator, reviewer, and adjudicator roles enforced by the system — meaning a reviewer cannot overwrite annotation data and an annotator cannot approve their own work — rather than relying on procedural convention?
IQ/OQ/PQ documentation. Does the platform vendor provide existing IQ/OQ/PQ validation documentation for the regulated-market version of the platform? If not, has the custom validation effort been scoped, resourced, and scheduled before the annotation project starts?
Annotator credential records. Are annotator qualifications (board certification, specialty training, annotation calibration results) recorded in the system and linked to the annotation records they produced? For medical annotation, the reviewer's board certification details must be retrievable for the specific cases they adjudicated.
Guidelines version control. Is the annotation guidelines document version-controlled within the platform, with each annotation record linked to the specific guidelines version that was active at the time it was created? Changes to guidelines mid-project must be captured in the audit trail.
Gold standard traceability. Are gold standard test cases and their outcomes stored as platform records — not as external spreadsheets — with full audit trail coverage? IAA results per annotator per time period must be exportable as part of the provenance record.
Backup and export completeness. Has a test restoration from backup been performed, confirming that annotation data, audit trail, signature records, and credential records are all fully recoverable? Is a complete provenance export package defined and tested before the project ends?
Data localisation and transfer controls. For programmes involving Protected Health Information (PHI), are HIPAA-compliant data handling controls in place alongside Part 11 requirements? PHI de-identification or encryption standards must be documented as part of the system validation.

Annotation vendors who cannot confirm each of these items before project kick-off are not equipped to deliver Part 11-compliant documentation for an FDA submission. The checklist is also a useful audit tool for reviewing existing annotation programmes where regulatory submission was not the original intent but has since become relevant. Our pricing page covers how compliance-grade annotation programmes are structured and what the cost differential is versus standard commercial annotation.

Common Failure Modes in FDA Review

FDA reviewers and inspectors who examine SaMD submissions encounter a predictable set of annotation provenance failures. Understanding these failure modes helps teams prioritise where to invest compliance effort.

Reconstructed audit trails. Teams that did not instrument their annotation platform for real-time audit trail capture attempt to reconstruct event records from platform logs, database exports, and annotator self-reports after the project ends. Reconstructed audit trails are distinguishable from contemporaneous ones — they typically lack action-level granularity, have timestamp clustering patterns inconsistent with real annotation workflows, and cannot demonstrate non-alterability. Reviewers flag these during inspection.

Annotator credential gaps. The submission states that annotations were reviewed by “board-certified radiologists,” but the supporting documentation does not link specific case adjudications to specific named clinicians with retrievable credential records. FDA expects the provenance to demonstrate not just that qualified reviewers were involved, but which qualified reviewer approved which specific records.

Guidelines version discontinuity. The annotation guidelines were revised mid-project — a necessary and expected occurrence in any real annotation programme — but the revision date and the cases annotated before and after the revision are not distinguished in the dataset records. The FDA's concern is that a guidelines change may have introduced a systematic label distribution shift that the team did not assess and that is now unexplained in the training data.

IAA records at batch level only. Inter-annotator agreement was calculated at the end of the project across the full dataset, reported as a single kappa value. Part 11 and Part 820's design history file requirements expect that quality metrics were monitored during production, not just calculated post-hoc. Per-annotator, per-time-period IAA records that show calibration was maintained throughout the annotation programme are significantly stronger than a final summary statistic.

Building a medical AI dataset for FDA submission?

We design annotation programmes for regulatory-track medical AI — Part 11-compliant audit trails, board-certified adjudication, IQ/OQ/PQ documentation, and complete provenance exports ready for design history file inclusion. Pilots scoped within 48 hours.

Talk to the medical annotation team

Frequently Asked Questions

Does 21 CFR Part 11 apply to all medical AI annotation or only FDA submissions?▼

21 CFR Part 11 applies to electronic records required under FDA regulations — it is not a general medical data standard. For SaMD pursuing FDA clearance or approval, the annotation dataset supporting the algorithm is a design record under 21 CFR Part 820, making Part 11 applicable. Teams building research-grade medical AI without FDA submission intent are not directly obligated, but adopting Part 11-aligned practices avoids costly retroactive documentation if the project later pivots toward regulatory submission.

What is the minimum audit trail format FDA expects for annotation provenance?▼

FDA does not prescribe a specific file format, but audit trails must be computer-generated, non-alterable, and independently stored from the annotation data. Each entry must capture: a unique user identifier, a timestamp with second-level granularity, the specific action (creation, modification, deletion, adjudication), the record identifier linking the action to a specific annotation, and for modifications, the previous value. Batch-level or daily sign-off logs do not satisfy Part 11 — individual event-level attribution is required.

Can we use Label Studio or CVAT for FDA submissions?▼

General-purpose open-source platforms were not designed for Part 11 compliance and lack native IQ/OQ/PQ validation documentation. Teams have used them for SaMD submissions by building compliant audit infrastructure around them, but the engineering cost typically exceeds the licence cost of a purpose-built medical platform such as MD.ai, Flywheel, or Proscia Concentriq — each of which ships with existing validation documentation and programmatic electronic signature infrastructure.

What is the difference between 21 CFR Part 11 and 21 CFR Part 820 for annotation teams?▼

21 CFR Part 820 (Quality System Regulation) defines what records you must keep — annotation datasets are design records under QSR. 21 CFR Part 11 defines how those records must be maintained electronically: audit trail requirements, electronic signature standards, access control architecture, and system validation requirements. Both apply simultaneously to annotation programmes supporting SaMD submissions — Part 820 sets the documentation obligation, Part 11 sets the technical standard for electronic records meeting that obligation.

How long must annotation provenance records be retained for FDA submissions?▼

Under 21 CFR Part 820.180, design records must be retained for at least two years from the date the device is released for distribution. For active commercial SaMD, this effectively means the full commercial life plus two years — potentially an indefinite retention window. Annotation vendors must deliver a complete provenance export at project close in a format that remains interpretable without access to the original annotation platform, as platform continuity cannot be guaranteed over a multi-year period.

Does the Australian TGA require equivalent annotation provenance for SaMD submissions?▼

Yes. TGA's SaMD guidance aligns with the IMDRF SaMD framework (IMDRF/SaMD N23 and N41), which requires training data traceability and quality management documentation functionally equivalent to FDA requirements. For dual FDA/TGA submissions, designing annotation documentation to satisfy the stricter FDA Part 11 standard first covers the TGA requirement without additional work. For TGA-only submissions, the same provenance checklist applies — audit trails, annotator credentials, adjudication records, and version-controlled guidelines documentation.

Free Sample · 24-48 hours

Need Part 11-compliant annotation for your medical AI programme?

We build annotation workflows for regulatory-track medical AI — FDA 21 CFR Part 11-aligned audit trails, board-certified clinician adjudication, IQ/OQ/PQ documentation, and complete provenance exports structured for design history file inclusion. Radiology, pathology, clinical NLP, and ophthalmology programmes.

Neel Bennett

AI Annotation Specialist at AI Taggers

Neel has over 8 years of experience in AI training data and machine learning operations. He specializes in helping enterprises build high-quality datasets for computer vision and NLP applications across healthcare, automotive, and retail industries.

Connect on LinkedIn