Question 1

What does EU AI Act Article 15 require?

Accepted Answer

High-risk AI systems must achieve appropriate accuracy, robustness, and cybersecurity throughout their lifecycle. Accuracy metrics must be declared in the instructions for use (Article 15(3)), and the system must be resilient to errors, faults, and inconsistencies (Article 15(4)). Source: Regulation (EU) 2024/1689 Article 15(1), 15(3), 15(4).

Question 2

When does Article 15 become enforceable?

Accepted Answer

Article 15 obligations for high-risk AI systems become enforceable on 2 August 2026, per Article 113. Source: Regulation (EU) 2024/1689 Article 113.

Question 3

Does RAG-Bench cover the cybersecurity leg of Article 15?

Accepted Answer

No. RAG-Bench covers accuracy and robustness — measuring faithfulness, retrieval precision, agentic metrics, and adversarial-passage robustness. The cybersecurity leg (prompt injection resistance, jailbreak defence, model integrity) requires a runtime AI security control such as AgentShield. Pair the two for full Article 15 coverage.

Question 4

What metrics does RAG-Bench measure?

Accepted Answer

Classic metrics (faithfulness, answer relevancy, context precision/recall), retrieval metrics (Precision@K, Recall@K, MRR, NDCG), and four agentic metrics (agent faithfulness, tool-call accuracy, source attribution, retrieval necessity).

Question 5

Is RAG-Bench framework-agnostic?

Accepted Answer

Yes. RAG-Bench works with LangChain, LlamaIndex, or any custom RAG system that returns a sample with `question`, `contexts`, and `answer` fields. SDK adapters for LangChain and LlamaIndex are included; custom integrations use the JSONL schema directly.

Question 6

What is the measured faithfulness on the golden dataset?

Accepted Answer

0.958 on the published 50-sample golden dataset (rated "Excellent"), with 0.810 answer relevancy ("Good"). These are the actual numbers from the v1.0.0-rc1 release benchmark — not aspirational targets.

Question 7

Can I bring my own evaluation dataset?

Accepted Answer

Yes. RAG-Bench accepts custom datasets in JSONL format with the expected schema. The bundled golden dataset is English-only; multilingual evaluation is not supported in v1.0.

Question 8

Is RAG-Bench free?

Accepted Answer

Yes. Apache 2.0 licensed. The harness itself runs locally; LLM-as-judge metrics depend on whichever judge model you configure (which may have its own usage cost).

Question 9

What is the penalty for Article 15 non-compliance?

Accepted Answer

Up to €15M or 3% of global annual turnover, whichever is higher, under Article 99(4). The provider-obligation chain via Article 16 routes Article 15 failures through this penalty band.

Question 10

How does drift monitoring work?

Accepted Answer

You declare an evaluation set version and a metric threshold. RAG-Bench replays the eval set against the live system on a schedule and alerts on metric regression — supporting the lifecycle-consistent-performance requirement of Article 15(1).

RAG Benchmarking

Quick Start

Benchmark Results

Features

Regulatory Foundation

What the regulation requires

What you face if you don't comply

How RAG Benchmarking addresses this

Frequently asked questions

Known Limitations

Contributing

License

One tool covers one article. The full set covers your audit.