Article 15v1.0.0-rc1Upcoming 2026-08-02

RAG Benchmarking

Plug in any RAG system — LangChain, LlamaIndex, or custom — and benchmark it against classic and agentic-era metrics. Faithfulness, answer relevancy, retrieval precision, and four agentic metrics for multi-step agents. Measured faithfulness of 0.958 on the 50-sample golden dataset.

Install in 30 seconds

bashpip install rag-benchmarking

RAG Benchmarking is Apache 2.0, free to use commercially, with zero telemetry. Source: Regulation (EU) 2024/1689 (Article 15).

Why RAG Benchmarking exists

Accuracy, robustness and cybersecurity

Article 15 becomes enforceable on 2 August 2026 for high-risk AI systems under Annex III. Providers must declare accuracy metrics in the instructions for use and demonstrate consistent performance across the lifecycle; non-compliance via the Article 16 provider-obligation chain is sanctionable up to €15M or 3% of global annual turnover under Article 99(4). For RAG-based high-risk systems, "appropriate accuracy" is not a self-asserted figure — it is a metric declared on the label and defensible against post-market evidence.

When the findings land on a governance desk

Tools surface problems. Programmes solve them.

RAG Benchmarking produces audit-ready evidence. The next step — programme design, board narrative, regulator engagement — is the surface AskAjay covers, the advisory arm of AI Exponent LLC.

Explore advisory at AskAjay.ai →