FairPlay's Model Risk Management Manual for Agentic Systems
Deploy Agents with Practical Guidelines, Not Guesswork
A companion to regulatory readiness for banks, fintechs, insurers, and the AI vendors who sell into them. Translate SR 11-7 into practical governance, validation, and monitoring for LLM powered agents.

Trusted by leading banks, fintechs, and AI vendors.
Why This Handbook Matters
SR 11-7, Modernized for Agents
Testing, Optimization and Monitoring tailored for dynamic, tool-enabled Agentic systems.
Practical Tools for Exam-Readiness
Exam questions, test procedures, and documentation templates that make you Regulator-Ready
Catch Risks Before They Catch You
Spot hallucinations, leakage, bias, and off-policy behavior before they escalate into incidents.
What's Inside
Financial services and insurance companies increasingly rely on AI AGENTS Regulatory agencies are intensifying their scrutiny of these models.
Who It's For
Model risk & validation teams
Compliance, audit, and second line functions
Data science & ai platform teams
AI vendors selling into regulated institutions
Risk leaders evaluating third-party agents
Bring your Model Risk Mangement practices into the agentic present
FAQ — Model Risk Management for Agentic Systems
What is an "agentic system" in financial services?
An agentic system is an AI-powered agent, often built on large language models (LLMs) and retrieval-augmented generation (RAG) architectures, that can take dynamic, multi-step actions. In financial services, these systems frequently have tool-use capabilities—they can query databases, trigger transactions, or update records. This expands their operational scope but also heightens risks if not carefully governed. Unlike static models, agentic systems are semi-autonomous and non-deterministic, producing different outputs for the same input depending on context, interactions, or vendor updates.
How does SR 11‑7 apply to LLM and RAG powered agents?
The Federal Reserve’s SR 11-7 supervisory letter sets out three core principles for model risk management: governance, conceptual soundness, and ongoing monitoring. These remain foundational, but their application must be reinterpreted for agentic systems, which are dynamic, multi-step, and tool-enabled.
Model definition expands beyond the foundation model to include fine-tuning datasets, RAG corpora, prompt templates, tool integrations, and plugins.
Conceptual soundness requires anticipating variability and testing systems under ambiguous or adversarial conditions.Validation must go beyond accuracy to cover behavioral, security, and resilience testing
Do we get finished documentation or frameworks to create it?
The manual provides structured frameworks to create an SR 11-7 Model Validation document for an AI agent that can survive examination by a federal financial regulator:
Key MRM Questions that risk teams should ask (e.g., about governance, inventory completeness, fairness).
Sample Tests that specify what to test, how to test it, and what evidence to collect.
Testing approaches that institutions can adapt to their own systems.This ensures institutions produce documentation and processes aligned with their own risk profiles, rather than relying on generic templates.
What evidence do regulators and bank reviewers expect for agentic systems?
The manual specifies evidence requirements for each control area, including:
Governance drills: escalation logs, decision rationale, post-mortems.
Inventory checks: annotated comparison reports, dependency diagrams, change tickets.
- Validation: hallucination rate metrics, injection prompt transcripts, incident reports.
Monitoring: assertion lists, compliance scan results, accuracy trend graphs. - Fairness tests: demographic parity analysis, consistency results, groundedness scores.
- Audit trail: version control logs, decision path reconstructions.
- Vendor oversight: SLA compliance records, penetration test results, exit plans.
- Validation: hallucination rate metrics, injection prompt transcripts, incident reports.
How should we monitor LLM agents in production?
The manual stresses ongoing monitoring that combines human oversight with automated checks to counteract vigilance fatigue and automation bias:
- Assertion-based compliance scanning against predefined rules.
- Factual accuracy checks comparing outputs to source-of-truth databases.
- Latency and load monitoring to maintain reliability.
- Update regression testing before and after vendor model updates.
Monitoring results should feed into dashboards with escalation triggers, and monitoring rules should evolve alongside regulatory and product changes.
How is fairness tested for agentic systems?
The manual explains several options for ongoing fairness evaluations of AI Agents including:
- Demographic Parity Simulation: Comparing outputs across simulated profiles differing only by protected attributes.
- Consistency Testing Across Variants: Checking if re-phrased queries or demographic changes yield consistent results.
- RAG Groundedness Tests: Ensuring answers are tied to verifiable retrieved sources.
Evidence includes test cases, statistical analysis (e.g., adverse impact ratios), and remediation documentation.
What is a champion–challenger fallback and why is it important?
A champion–challenger setup means running a parallel model (“challenger”) alongside the production model (“champion”). The challenger is actively monitored and can replace the champion immediately if performance falls below defined thresholds.
This approach ensures continuity and mitigates the temptation to leave a failing system online. The manual encourages fallback systems to be tested in advance and tied to objective disablement triggers (e.g., hallucination rate thresholds).
How should banks manage third‑party and foundation‑model vendor risk?
The manual emphasizes that institutions remain responsible for vendor actions:
- Vendor assessment against SR 11-7 principles.
- SLAs for notice of updates, disclosure of documentation, and timely incident reporting.
- Independent guardrail testing, including penetration tests to probe bypass risks.
- Exit strategies to transition away from vendors that no longer meet requirements.
How can AI vendors use the manual in bank due diligence?
Vendors can use the manual to anticipate and satisfy bank reviewers’ expectations:
- Evidence preparation: inventories, validation reports, groundedness checks.
- Conceptual soundness: documenting architecture decisions and risk trade-offs.
- Auditability: maintaining version control and decision path reconstruction.
- Regulatory readiness: demonstrating governance and monitoring aligned with SR 11-7 and examiner expectations.








