FairPlay's Model Risk Management Manual for Agentic Systems

Deploy Agents with Practical Guidelines, Not Guesswork

A companion to regulatory readiness for banks, fintechs, insurers, and the AI vendors who sell into them. Translate SR 11-7 into practical governance, validation, and monitoring for LLM powered agents.

Download the Free Handbook
FairPlay's Model Risk Management Manual cover with abstract design
Fairplay_John_Pattern3 (1)

Trusted by leading banks, fintechs, and AI vendors.

Chime logo in green and black colors
Empty black image
Stylish black 'FIGURE' text on dark background
Black rectangle on white background
Blue neon outline of 'OCTANE' text logo
Happy Money logo in black text
Dark background with no visible content
White ovals forming letter B on black background

Why This Handbook Matters

SR 11-7, Modernized for Agents

Testing, Optimization and Monitoring tailored for dynamic, tool-enabled Agentic systems.

Practical Tools for Exam-Readiness

Exam questions, test procedures, and documentation templates that make you Regulator-Ready

Catch Risks Before They Catch You

Spot hallucinations, leakage, bias, and off-policy behavior before they escalate into incidents.

What's Inside

Financial services and insurance companies increasingly rely on AI AGENTS Regulatory agencies are intensifying their scrutiny of these models.

Guardrails That Govern

Guidance on lifecycle oversight, escalation protocols, and fallback strategies for agentic systems.

System Visibility

How to build a complete inventory of models, tools, RAG sources, and integrations—while detecting and governing "shadow AI."

Solid by Design

Frameworks for choosing the right architecture (LLM vs. LLM+RAG), setting boundaries, and designing prompts and retrieval strategies.

Prove it Before Production

Structured testing for hallucinations, data leakage, adversarial prompts, and off-policy behavior—with independent review.

Continuous Monitoring

Ongoing checks combining automated compliance assertions and factual accuracy testing with clear escalation triggers.

Fairness of Outcomes

Methods for evaluating demographic parity, consistency, and groundedness to ensure compliant, non-discriminatory results.

Vendor Oversight and Auditability

Independent testing of vendor guardrails, SLA requirements, and robust audit trails to keep systems explainable and regulator-ready.

Who It's For

Model risk & validation teams

Compliance, audit, and second line functions

Data science & ai platform teams

AI vendors selling into regulated institutions

Risk leaders evaluating third-party agents

Bring your  Model Risk Mangement practices into the agentic present

FAQ — Model Risk Management for Agentic Systems

  • What is an "agentic system" in financial services?

    An agentic system is an AI-powered agent, often built on large language models (LLMs) and retrieval-augmented generation (RAG) architectures, that can take dynamic, multi-step actions. In financial services, these systems frequently have tool-use capabilities—they can query databases, trigger transactions, or update records. This expands their operational scope but also heightens risks if not carefully governed. Unlike static models, agentic systems are semi-autonomous and non-deterministic, producing different outputs for the same input depending on context, interactions, or vendor updates.

  • How does SR 11‑7 apply to LLM and RAG powered agents?

    The Federal Reserve’s SR 11-7 supervisory letter sets out three core principles for model risk management: governance, conceptual soundness, and ongoing monitoring. These remain foundational, but their application must be reinterpreted for agentic systems, which are dynamic, multi-step, and tool-enabled.

    Model definition expands beyond the foundation model to include fine-tuning datasets, RAG corpora, prompt templates, tool integrations, and plugins.
    Conceptual soundness requires anticipating variability and testing systems under ambiguous or adversarial conditions.

    Validation must go beyond accuracy to cover behavioral, security, and resilience testing

  • Do we get finished documentation or frameworks to create it?

    The manual provides structured frameworks to create an SR 11-7 Model Validation document for an AI agent that can survive examination by a federal financial regulator:

    Key MRM Questions that risk teams should ask (e.g., about governance, inventory completeness, fairness).

    Sample Tests that specify what to test, how to test it, and what evidence to collect.
    Testing approaches that institutions can adapt to their own systems.

    This ensures institutions produce documentation and processes aligned with their own risk profiles, rather than relying on generic templates.

  • What evidence do regulators and bank reviewers expect for agentic systems?

    The manual specifies evidence requirements for each control area, including:

    Governance drills: escalation logs, decision rationale, post-mortems.

    Inventory checks: annotated comparison reports, dependency diagrams, change tickets.

    • Validation: hallucination rate metrics, injection prompt transcripts, incident reports.
      Monitoring: assertion lists, compliance scan results, accuracy trend graphs.
    • Fairness tests: demographic parity analysis, consistency results, groundedness scores.
    • Audit trail: version control logs, decision path reconstructions.
    • Vendor oversight: SLA compliance records, penetration test results, exit plans.
  • How should we monitor LLM agents in production?

    The manual stresses ongoing monitoring that combines human oversight with automated checks to counteract vigilance fatigue and automation bias:

    • Assertion-based compliance scanning against predefined rules.
    • Factual accuracy checks comparing outputs to source-of-truth databases.
    • Latency and load monitoring to maintain reliability.
    • Update regression testing before and after vendor model updates.

    Monitoring results should feed into dashboards with escalation triggers, and monitoring rules should evolve alongside regulatory and product changes.

  • How is fairness tested for agentic systems?

    The manual explains several options for ongoing fairness evaluations of AI Agents including:

    • Demographic Parity Simulation: Comparing outputs across simulated profiles differing only by protected attributes.
    • Consistency Testing Across Variants: Checking if re-phrased queries or demographic changes yield consistent results.
    • RAG Groundedness Tests: Ensuring answers are tied to verifiable retrieved sources.

    Evidence includes test cases, statistical analysis (e.g., adverse impact ratios), and remediation documentation.

  • What is a champion–challenger fallback and why is it important?

    A champion–challenger setup means running a parallel model (“challenger”) alongside the production model (“champion”). The challenger is actively monitored and can replace the champion immediately if performance falls below defined thresholds.

    This approach ensures continuity and mitigates the temptation to leave a failing system online. The manual encourages fallback systems to be tested in advance and tied to objective disablement triggers (e.g., hallucination rate thresholds).

  • How should banks manage third‑party and foundation‑model vendor risk?

    The manual emphasizes that institutions remain responsible for vendor actions:

    • Vendor assessment against SR 11-7 principles.
    • SLAs for notice of updates, disclosure of documentation, and timely incident reporting.
    • Independent guardrail testing, including penetration tests to probe bypass risks.
    • Exit strategies to transition away from vendors that no longer meet requirements.
  • How can AI vendors use the manual in bank due diligence?

    Vendors can use the manual to anticipate and satisfy bank reviewers’ expectations:

    • Evidence preparation: inventories, validation reports, groundedness checks.
    • Conceptual soundness: documenting architecture decisions and risk trade-offs.
    • Auditability: maintaining version control and decision path reconstruction.
    • Regulatory readiness: demonstrating governance and monitoring aligned with SR 11-7 and examiner expectations.

Ready to Operationalize Agentic MRM?