Skip to content

ML Manager Workflow

Guide for ML managers, team leads, and AI governance professionals overseeing model development teams and establishing organization-wide compliance policies.

Overview

This guide is for managers and team leads who need to:

  • Establish model governance policies
  • Configure compliance gates for teams
  • Track audit status across model portfolio
  • Train teams on compliance requirements
  • Report to executives and regulators
  • Evaluate vendor tools and processes

Not a manager? For hands-on implementation, see ML Engineer Workflow. For compliance review, see Compliance Officer Workflow.

Key Capabilities

Policy-as-Code Governance

Centralized compliance configuration:

  • Define organization-wide fairness thresholds
  • Specify required metrics for all models
  • Enforce determinism and reproducibility
  • Version-control policy changes

Portfolio-Level Tracking

Monitor multiple models:

  • Registry of all audited models
  • Compliance status dashboard
  • Failed gate tracking
  • Trend analysis over time

Team Enablement

Support model development teams:

  • Template configurations for common use cases
  • Training materials and best practices
  • Self-service audit generation
  • Clear escalation paths

Typical Workflows

Workflow 1: Establishing Organization-Wide Compliance Policy

Scenario: Define baseline compliance requirements for all credit models in your organization.

Step 1: Define policy requirements

Work with legal, risk, and compliance teams to establish thresholds:

Example requirements discussion:

  • Legal: "ECOA requires no disparate impact - what's acceptable threshold?"
  • Risk: "What calibration error is acceptable for our risk appetite?"
  • Compliance: "SR 11-7 requires monitoring - what metrics should we track?"

Documented requirements:

# Policy: Credit Models Baseline (v1.0)
# Effective: 2025-01-01
# Applies to: All consumer credit models
# Review frequency: Quarterly

Requirements:
  - Calibration ECE < 5%
  - Demographic parity difference < 10%
  - Equalized odds difference < 15%
  - Minimum group size ≥ 30
  - Statistical power ≥ 0.80
  - Model stability under ±10% demographic shifts

Step 2: Create policy configuration

Translate requirements into policy-as-code:

# policy/org_credit_baseline_v1.yaml
policy_name: "Organization Credit Model Baseline"
version: "1.0"
effective_date: "2025-01-01"
applies_to: ["credit_scoring", "loan_pricing", "risk_assessment"]
citation: "SR 11-7, ECOA, Internal Risk Policy RP-2024-03"

gates:
  # Performance requirements
  - name: "Minimum Model Performance"
    metric: "roc_auc"
    threshold: 0.75
    comparison: "greater_than"
    severity: "error"
    clause: "Internal Risk Policy RP-2024-03 §2.1"
    rationale: "Minimum discriminative ability for credit decisions"

  # Calibration requirements
  - name: "Calibration Quality"
    metric: "expected_calibration_error"
    threshold: 0.05
    comparison: "less_than"
    severity: "error"
    clause: "SR 11-7 §III.B.1"
    rationale: "Predicted probabilities must align with observed outcomes"

  # Fairness requirements
  - name: "Demographic Parity"
    metric: "demographic_parity_difference"
    threshold: 0.10
    comparison: "less_than"
    severity: "error"
    clause: "ECOA fair lending"
    rationale: "Maximum acceptable approval rate difference across protected groups"

  - name: "Equalized Odds"
    metric: "equalized_odds_difference"
    threshold: 0.15
    comparison: "less_than"
    severity: "warning"
    clause: "ECOA fair lending"
    rationale: "TPR/FPR parity across protected groups"

  # Data quality requirements
  - name: "Minimum Statistical Power"
    metric: "min_group_size"
    threshold: 30
    comparison: "greater_than"
    severity: "error"
    clause: "SR 11-7 §III.B.2"
    rationale: "Adequate sample size for statistical significance"

  - name: "Statistical Power for Bias Detection"
    metric: "statistical_power"
    threshold: 0.80
    comparison: "greater_than"
    severity: "warning"
    clause: "SR 11-7 §III.B.2"
    rationale: "Ability to detect 5pp difference with 95% confidence"

  # Robustness requirements
  - name: "Demographic Shift Robustness"
    metric: "max_shift_degradation"
    threshold: 0.10
    comparison: "less_than"
    severity: "error"
    clause: "SR 11-7 §III.A.3"
    rationale: "Model must remain stable under population changes"

reproducibility:
  strict_mode_required: true
  seed_required: true
  manifest_required: true
  git_commit_required: true

review_cycle:
  frequency: "quarterly"
  next_review: "2025-04-01"
  owner: "Chief Risk Officer"

Step 3: Communicate to teams

Create team-facing documentation:

# Credit Model Compliance Policy (v1.0)

## What Changed

Effective January 1, 2025, all credit models must meet baseline compliance gates.

## Required Actions

1. Update your audit config to reference this policy
2. Run audit with `--policy-gates policy/org_credit_baseline_v1.yaml`
3. Address any failed gates before requesting deployment approval

## Policy Gates

- **Calibration**: ECE < 5% (ERROR - blocks deployment)
- **Fairness**: Demographic parity < 10% (ERROR - blocks deployment)
- **Performance**: AUC ≥ 0.75 (ERROR - blocks deployment)
- **Robustness**: Max degradation < 10% under shift (ERROR)
- **Equalized Odds**: < 15% (WARNING - requires documentation)

## How to Use

```bash
glassalpha audit \
  --config your_model_config.yaml \
  --policy-gates policy/org_credit_baseline_v1.yaml \
  --strict \
  --output audit_report.pdf
```

Getting Help

  • Policy questions: Contact Risk Management Team
  • Technical support: #ml-compliance Slack channel
  • Approval requests: Submit audit report to Model Review Board

Resources

#### Step 4: Pilot with one team

Roll out to one team first:

```bash
# Team validates policy on existing models
cd team-credit-scoring/
glassalpha audit \
  --config models/current_model.yaml \
  --policy-gates policy/org_credit_baseline_v1.yaml \
  --strict

Pilot checklist:

  • [ ] Policy gates run successfully
  • [ ] Failed gates are actionable (clear fix path)
  • [ ] Documentation is sufficient for self-service
  • [ ] Performance impact is acceptable (<5 second overhead)
  • [ ] Team understands remediation process

Step 5: Organization-wide rollout

After successful pilot:

  1. Announce: Email + Slack announcement with 2-week lead time
  2. Train: Host training session (live + recorded)
  3. Support: Dedicated support channel for first month
  4. Enforce: Update CI/CD pipelines to require policy gates
  5. Monitor: Track adoption and failed gates

Workflow 2: Model Portfolio Tracking

Scenario: Track compliance status across 20+ models in production.

Step 1: Establish audit tracking

Create centralized audit log using simple JSONL format:

# Create audit logs directory
mkdir -p audit_logs

# Each audit appends to the log
glassalpha audit \
  --config model_config.yaml \
  --output audits/credit_model_v2.pdf

# Store metadata in JSONL log
echo '{"model_id":"credit_model_v2","timestamp":"2025-01-05","gates_passed":true,"team":"credit-risk"}' >> audit_logs/all_audits.jsonl

Step 2: Configure automated logging in CI/CD

Teams track audits using manifest files:

# CI/CD pipeline configuration
name: Model Audit
on: [push]

jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - name: Run audit
        run: |
          glassalpha audit \
            --config model_config.yaml \
            --output audit.pdf

      - name: Log audit result
        run: |
          python scripts/log_audit.py \
            --manifest audit.manifest.json \
            --team ${{ vars.TEAM_NAME }} \
            --log audit_logs/all_audits.jsonl

Log script example:

# scripts/log_audit.py
import json
import sys
from pathlib import Path

def log_audit(manifest_path, team, log_path):
    with open(manifest_path) as f:
        manifest = json.load(f)

    log_entry = {
        "model_id": manifest["model_id"],
        "timestamp": manifest["timestamp"],
        "gates_passed": all(g["status"] == "PASS" for g in manifest.get("gates", [])),
        "team": team,
        "audit_file": str(Path(manifest_path).with_suffix('.pdf'))
    }

    with open(log_path, 'a') as f:
        f.write(json.dumps(log_entry) + '\n')

if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("--manifest", required=True)
    parser.add_argument("--team", required=True)
    parser.add_argument("--log", required=True)
    args = parser.parse_args()
    log_audit(args.manifest, args.team, args.log)

Step 3: Query audit logs for portfolio view

# scripts/query_audits.py
import json
from pathlib import Path
from datetime import datetime, timedelta

def get_recent_audits(log_path, days=30):
    cutoff = datetime.now() - timedelta(days=days)
    audits = []

    with open(log_path) as f:
        for line in f:
            audit = json.loads(line)
            audit_date = datetime.fromisoformat(audit["timestamp"])
            if audit_date > cutoff:
                audits.append(audit)

    return audits

# Get failed audits
audits = get_recent_audits("audit_logs/all_audits.jsonl")
failed = [a for a in audits if not a["gates_passed"]]

print("Failed audits:")
for audit in failed:
    print(f"  {audit['model_id']} ({audit['team']}) - {audit['timestamp']}")

Step 4: Generate executive dashboard

# scripts/generate_dashboard.py
import json
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

# Load audit log
audits = []
with open("audit_logs/all_audits.jsonl") as f:
    audits = [json.loads(line) for line in f]

# Summary metrics
total_models = len(audits)
passed = sum(1 for a in audits if a["gates_passed"])
failed = total_models - passed

# Create summary
summary = pd.DataFrame({
    "Status": ["Passed", "Failed"],
    "Count": [passed, failed],
    "Percentage": [
        passed/total_models*100,
        failed/total_models*100
    ]
})

# Generate plots
fig, ax = plt.subplots(1, 1, figsize=(10, 6))

# Status pie chart
ax.pie(summary["Count"], labels=summary["Status"], autopct='%1.1f%%')
ax.set_title("Model Compliance Status (Last 30 Days)")

plt.tight_layout()
plt.savefig("reports/portfolio_dashboard.pdf")
print(f"Dashboard saved: reports/portfolio_dashboard.pdf")

Step 5: Executive reporting

Monthly report to leadership:

# Model Compliance Report - January 2025

## Executive Summary

- **Total Models**: 24 in production
- **Compliance Status**: 21 passed (87.5%), 2 blocked (8.3%), 1 under review (4.2%)
- **Top Issue**: Calibration quality (2 models failing ECE threshold)
- **Action Required**: Credit Model v2 and Loan Pricing v1 need recalibration

## Portfolio Overview

[Include: portfolio_dashboard.pdf]

## Models Requiring Attention

### Credit Model v2 (BLOCKED)

- **Issue**: Calibration ECE = 0.078 (threshold: 0.05)
- **Impact**: Cannot deploy to production
- **Owner**: Credit Risk Team
- **Timeline**: Remediation in progress, reaudit scheduled 2025-01-15
- **Risk**: Low (model not yet in production)

### Loan Pricing v1 (REVIEW)

- **Issue**: Equalized odds difference = 0.17 (warning threshold: 0.15)
- **Impact**: Requires documentation and monitoring
- **Owner**: Lending Team
- **Action**: Document mitigation strategy, increase monitoring frequency
- **Risk**: Medium (model in production, regulatory review possible)

## Trends

- Fairness metrics improving quarter-over-quarter
- Average audit turnaround time: 3.2 days (down from 5.1 days)
- 95% of models meeting calibration requirements (up from 87%)

## Recommendations

1. Increase training on calibration techniques
2. Update policy to require monthly re-audit for models with warning-level gates
3. Invest in automated monitoring for production models

Workflow 3: Team Training and Enablement

Scenario: Onboard new team members on model compliance process.

Step 1: Create training materials

Training deck outline:

  1. Why Model Auditing Matters (10 min)

  2. Regulatory landscape (SR 11-7, ECOA, EU AI Act)

  3. Real-world consequences of model bias
  4. Company policy and requirements

  5. GlassAlpha Basics (15 min)

  6. Architecture overview

  7. Config-driven approach
  8. Policy gates and compliance

  9. Hands-On: Your First Audit (20 min)

  10. Live demo with German Credit dataset

  11. Interpreting audit reports
  12. Addressing failed gates

  13. Production Workflow (10 min)

  14. CI/CD integration

  15. Registry submission
  16. Approval process

  17. Q&A and Resources (5 min)

Step 2: Create template repository

# Create org template repo
glassalpha-template-credit/
├── README.md                  # Getting started guide
├── .github/
   └── workflows/
       └── model-audit.yml    # Pre-configured CI
├── configs/
   ├── dev_audit.yaml         # Fast iteration config
   ├── prod_audit.yaml        # Full audit config
   └── policy/
       └── org_baseline.yaml  # Organization policy (symlink)
├── data/
   └── README.md              # Data requirements
├── models/
   └── README.md              # Model saving conventions
├── notebooks/
   └── example_audit.ipynb    # Starter notebook
└── tests/
    └── test_audit.py          # Audit validation tests

README template:

# Credit Model Audit Template

Quick-start template for credit model compliance audits.

## Quick Start

```bash
# 1. Clone this template
# 2. Add your model and data
cp my_model.pkl models/
cp my_data.csv data/

# 3. Update config
vim configs/prod_audit.yaml

# 4. Run audit
glassalpha audit --config prod_audit.yaml --output audit.pdf
```

Configuration

  • dev_audit.yaml: Fast iteration (reduced samples, skips slow sections)
  • prod_audit.yaml: Full audit (required for deployment approval)

Policy Gates

This template uses organization baseline policy (v1.0):

  • Calibration ECE < 5%
  • Demographic parity < 10%
  • AUC ≥ 0.75

See Policy Gates section above for details.

CI/CD

Push to main triggers automatic audit. Failed gates block merge.

Getting Help

#### Step 3: Conduct training sessions

**Session format:**
- Monthly introductory sessions (1 hour)
- Advanced topics quarterly (deep dives)
- Office hours weekly (drop-in support)

**Track attendance:**
```python
# scripts/track_training.py
training_log = {
    "date": "2025-01-15",
    "session": "Introduction to Model Auditing",
    "attendees": 12,
    "teams_represented": ["credit-risk", "fraud", "lending"],
    "feedback_score": 4.6
}

Step 4: Create self-service resources

Internal wiki:

  • FAQ: Common audit errors and fixes
  • Troubleshooting guide: Step-by-step debugging
  • Best practices: Team-specific recommendations
  • Example audits: Real (anonymized) audit reports

Slack bot for quick help:

/audit-help failed-gate demographic_parity

Response:
📘 Failed Gate: Demographic Parity

Issue: Approval rate difference between protected groups exceeds 10% threshold.

Common causes:
1. Imbalanced training data
2. Proxy features correlated with protected attributes
3. Threshold not optimized for fairness

Recommended fixes:
1. Check for dataset bias: `glassalpha detect-bias --data data.csv`
2. Review feature correlations: See [Feature Selection Guide]
3. Try threshold adjustment: See [Threshold Tuning Guide]
4. Consider fairness constraints: See [Fair Training Guide]

Need more help? Post in #ml-compliance or DM @compliance-team

Workflow 4: Vendor and Tool Evaluation

Scenario: Evaluate if GlassAlpha meets your organization's needs compared to alternatives.

Evaluation criteria matrix

Criterion Weight GlassAlpha Vendor A Vendor B Internal Build
Functionality
Fairness metrics 10% 9/10 8/10 9/10 6/10
Explainability 10% 9/10 7/10 8/10 5/10
Calibration testing 5% 9/10 6/10 7/10 4/10
Custom models 5% 8/10 9/10 7/10 10/10
Integration
CI/CD integration 10% 9/10 7/10 8/10 9/10
Existing ML pipeline 10% 8/10 6/10 7/10 10/10
API quality 5% 9/10 8/10 7/10 7/10
Compliance
Audit trails 10% 10/10 8/10 7/10 5/10
Reproducibility 10% 10/10 7/10 6/10 6/10
Regulatory templates 5% 8/10 9/10 8/10 3/10
Operations
On-premise deployment 10% 10/10 5/10 6/10 10/10
Maintenance burden 5% 9/10 8/10 8/10 4/10
Training requirements 5% 8/10 7/10 6/10 5/10
Total Score 90% 73% 72% 67%

Pilot program structure

Week 1-2: Setup

  • Install GlassAlpha
  • Configure for 2-3 representative models
  • Train 3-5 team members

Week 3-4: Parallel Run

  • Run GlassAlpha audits alongside existing process
  • Compare outputs
  • Measure time/effort savings

Week 5-6: Evaluation

  • Team feedback survey
  • Cost-benefit analysis
  • Decision presentation to leadership

Success criteria:

  • 80%+ of audits complete successfully
  • 50%+ time savings vs manual process
  • ⅘+ satisfaction score from teams
  • Regulatory-ready outputs

Best Practices

Policy Management

  • Version control: Track all policy changes in Git
  • Review cycles: Quarterly policy reviews with stakeholders
  • Communication: 2-week notice before policy changes
  • Exceptions: Formal exception process with executive approval
  • Documentation: Clear rationale for every threshold

Team Enablement

  • Self-service first: Provide templates and documentation
  • Support channels: Slack + office hours + on-demand training
  • Feedback loops: Regular surveys and retrospectives
  • Champion programs: Identify power users in each team
  • Success stories: Share wins to build momentum

Portfolio Monitoring

  • Automated tracking: Registry integration in all CI/CD pipelines
  • Regular reporting: Monthly portfolio dashboard to leadership
  • Proactive intervention: Alert teams before regulatory review
  • Trend analysis: Track metrics over time to spot systemic issues
  • Resource allocation: Prioritize support based on portfolio risk

Change Management

  • Pilot first: Test with friendly team before org rollout
  • Incremental adoption: Start with new models, gradually expand
  • Clear benefits: Emphasize time savings and risk reduction
  • Executive sponsorship: Ensure leadership visibility and support
  • Celebrate wins: Highlight successful audits and approvals

Common Challenges and Solutions

Challenge: Team pushback on compliance overhead

Solution:

  • Show time savings: "10 hours manual → 30 minutes automated"
  • Emphasize risk reduction: "Avoid regulatory fines"
  • Provide templates: "80% pre-configured, just add your model"
  • Share success stories: "Team X deployed in half the time"

Challenge: Policy too strict, blocking deployments

Solution:

  • Review gates with stakeholders
  • Consider tiered thresholds (warning vs error)
  • Allow documented exceptions with approval
  • Adjust thresholds based on data (e.g., if 80% models fail, threshold may be unrealistic)

Challenge: Teams bypassing audits

Solution:

  • Enforce at CI/CD level (audit required for merge)
  • Require audit report for deployment approval
  • Track non-compliance in performance reviews
  • Provide support to make compliance easy

Challenge: Inconsistent audit quality

Solution:

  • Provide templates and standards
  • Automated validation in CI (config linting)
  • Peer review process for audits
  • Training on common pitfalls

Metrics to Track

Adoption Metrics

  • Number of teams using GlassAlpha
  • Audits per month
  • Template adoption rate
  • Training attendance

Quality Metrics

  • Failed gate rate by team
  • Time from audit to approval
  • Number of reaudits required
  • Audit completeness score

Impact Metrics

  • Time savings per audit (vs manual)
  • Deployment velocity (time to production)
  • Regulatory findings (should decrease)
  • Compliance incidents (should decrease)

Team Satisfaction

  • Survey scores (quarterly)
  • Support ticket volume
  • Training effectiveness ratings

For Managers

For Teams

Policy Resources

Support

For manager-specific questions: