EU AI Act Article 15 — Accuracy, robustness, and cybersecurity
Article 15 of the EU AI Act requires high-risk AI systems to achieve appropriate accuracy, robustness, and cybersecurity, and to perform consistently in those respects throughout the lifecycle. Accuracy metrics must be declared in the instructions for use; robustness must extend to errors, faults, and inconsistencies including adversarial inputs. Enforceable from 2 August 2026.
Source: Regulation (EU) 2024/1689 (EU AI Act), CELEX:32024R1689.
What Article 15 actually says
1. High-risk AI systems shall be designed and developed in such a way that they achieve an appropriate level of accuracy, robustness, and cybersecurity, and that they perform consistently in those respects throughout their lifecycle. 3. The levels of accuracy and the relevant accuracy metrics of high-risk AI systems shall be declared in the accompanying instructions of use. 4. High-risk AI systems shall be as resilient as possible regarding errors, faults or inconsistencies that may occur within the system or the environment in which the system operates, in particular due to their interaction with natural persons or other systems. Technical and organisational measures shall be taken in this regard. The robustness of high-risk AI systems may be achieved through technical redundancy solutions, which may include backup or fail-safe plans.
Paragraphs: 15(1) · 15(3) · 15(4)
Application date
2026-08-02
Status: UPCOMING
Penalty band
Up to €15M or 3% of global annual turnover
Sanction route: Article 99(4)
Article 15 becomes enforceable on 2 August 2026 for high-risk AI systems under Annex III. Providers must declare accuracy metrics in the instructions for use and demonstrate consistent performance across the lifecycle; non-compliance via the Article 16 provider-obligation chain is sanctionable up to €15M or 3% of global annual turnover under Article 99(4). For RAG-based high-risk systems, "appropriate accuracy" is not a self-asserted figure — it is a metric declared on the label and defensible against post-market evidence.
Practical compliance with RAG Benchmarking
RAG Benchmarking is a framework-agnostic evaluation harness for RAG and agentic AI systems. It covers Article 15's accuracy and robustness requirements through reproducible benchmarks — faithfulness, answer relevancy, retrieval precision, four agentic metrics — with versioned eval sets and lifecycle drift monitoring. Article 15 also requires cybersecurity (prompt injection resistance, jailbreak defence, model integrity); pair RAG-Bench with a runtime AI security control such as AgentShield for full Article 15 coverage.
- 15(1)Reproducible accuracy benchmarks for RAG pipelines (retrieval recall, answer faithfulness, citation precision) with versioned eval sets
- 15(3)Generates the accuracy-metrics block for the Article 13 instructions for use, with confidence intervals and eval-set provenance
- 15(4)Robustness suite: input perturbations, noisy-context, adversarial-passage, and OOD query stress tests with pass/fail thresholds
- 15(4)Lifecycle drift monitoring — replays the declared eval set against the live system on a schedule and alerts on metric regression
Frequently asked questions
Direct answers to common questions about Article 15 and how RAG Benchmarking addresses it. Regulatory citations reference EUR-Lex CELEX:32024R1689.
- What does EU AI Act Article 15 require?
- High-risk AI systems must achieve appropriate accuracy, robustness, and cybersecurity throughout their lifecycle. Accuracy metrics must be declared in the instructions for use (Article 15(3)), and the system must be resilient to errors, faults, and inconsistencies (Article 15(4)). Source: Regulation (EU) 2024/1689 Article 15(1), 15(3), 15(4).
- When does Article 15 become enforceable?
- Article 15 obligations for high-risk AI systems become enforceable on 2 August 2026, per Article 113. Source: Regulation (EU) 2024/1689 Article 113.
- Does RAG-Bench cover the cybersecurity leg of Article 15?
- No. RAG-Bench covers accuracy and robustness — measuring faithfulness, retrieval precision, agentic metrics, and adversarial-passage robustness. The cybersecurity leg (prompt injection resistance, jailbreak defence, model integrity) requires a runtime AI security control such as AgentShield. Pair the two for full Article 15 coverage.
- What metrics does RAG-Bench measure?
- Classic metrics (faithfulness, answer relevancy, context precision/recall), retrieval metrics (Precision@K, Recall@K, MRR, NDCG), and four agentic metrics (agent faithfulness, tool-call accuracy, source attribution, retrieval necessity).
- Is RAG-Bench framework-agnostic?
- Yes. RAG-Bench works with LangChain, LlamaIndex, or any custom RAG system that returns a sample with `question`, `contexts`, and `answer` fields. SDK adapters for LangChain and LlamaIndex are included; custom integrations use the JSONL schema directly.
- What is the measured faithfulness on the golden dataset?
- 0.958 on the published 50-sample golden dataset (rated "Excellent"), with 0.810 answer relevancy ("Good"). These are the actual numbers from the v1.0.0-rc1 release benchmark — not aspirational targets.
- Can I bring my own evaluation dataset?
- Yes. RAG-Bench accepts custom datasets in JSONL format with the expected schema. The bundled golden dataset is English-only; multilingual evaluation is not supported in v1.0.
- Is RAG-Bench free?
- Yes. Apache 2.0 licensed. The harness itself runs locally; LLM-as-judge metrics depend on whichever judge model you configure (which may have its own usage cost).
- What is the penalty for Article 15 non-compliance?
- Up to €15M or 3% of global annual turnover, whichever is higher, under Article 99(4). The provider-obligation chain via Article 16 routes Article 15 failures through this penalty band.
- How does drift monitoring work?
- You declare an evaluation set version and a metric threshold. RAG-Bench replays the eval set against the live system on a schedule and alerts on metric regression — supporting the lifecycle-consistent-performance requirement of Article 15(1).
When the findings land on a governance desk
Tools surface problems. Programmes solve them.
Declared accuracy on the instructions for use is a number that has to survive post-market scrutiny. AskAjay's A7 framework structures the readiness review and pairs with RAG-Bench evidence packs when the audit conversation begins.
Framework: A7 (Agentic AI Readiness Framework) — at AskAjay.ai, the advisory arm of AI Exponent LLC.
Explore the A7 (Agentic AI Readiness Framework) framework →Other articles in the EU AI Act