Frequently asked questions¶

Common questions about GlassAlpha capabilities, usage, and integration.

General questions¶

What is GlassAlpha?¶

GlassAlpha is an open-source AI compliance toolkit that generates comprehensive audit reports for machine learning models. It provides:

Automated bias detection and fairness analysis
Model explanations using SHAP and other interpretability methods
Reason codes for ECOA-compliant adverse action notices
Professional PDF reports suitable for regulatory review
Complete reproducibility with audit trails and manifests
Regulatory compliance support for GDPR, ECOA, FCRA, and other frameworks

Who should use GlassAlpha?¶

GlassAlpha is designed for:

Data scientists who need to audit ML models for bias and fairness
Compliance teams ensuring regulatory adherence for algorithmic decisions
Risk management professionals assessing model risks
Legal teams preparing for regulatory review or litigation
Academic researchers studying algorithmic fairness and interpretability

How does GlassAlpha ensure audit quality?¶

GlassAlpha maintains audit quality through:

Deterministic execution with fixed random seeds for reproducible results
Complete audit trails tracking all decisions and configurations
Statistical rigor with confidence intervals and significance testing
Professional reporting with publication-quality visualizations
Regulatory alignment with established compliance frameworks

For detailed information about system design and quality assurance, see the Trust & Deployment Guide.

Installation & setup¶

What are the system requirements?¶

Minimum Requirements:

Python 3.11 or higher
2GB available RAM
1GB disk space for installation and temporary files

Recommended:

Python 3.11+
8GB+ RAM for large datasets
SSD storage for better performance
Multi-core CPU for parallel processing

Which operating systems are supported?¶

GlassAlpha is tested and supported on:

macOS 10.15+ (Intel and Apple Silicon)
Linux (Ubuntu 20.04+, CentOS 8+, and most modern distributions)
Windows 10/11 (via WSL2 recommended)

How do I install GlassAlpha?¶

Standard Installation:

Clone and setup:

git clone https://github.com/GlassAlpha/glassalpha
cd glassalpha

Python 3.11, 3.12, or 3.13 supported:

python3 --version   # should show 3.11.x, 3.12.x, or 3.13.x

Create a virtual environment (recommended):

python3 -m venv .venv
source .venv/bin/activate

Upgrade pip and install in editable mode:

python -m pip install --upgrade pip
pip install -e ".[dev]"

Verify installation:

glassalpha --help

See the Quick Start Guide for detailed instructions.

What dependencies does GlassAlpha require?¶

Core Dependencies (base install):

pandas, numpy (data processing)
scikit-learn (machine learning and LogisticRegression)
Matplotlib, Seaborn (visualizations)
Jinja2, WeasyPrint (HTML/PDF generation)

Optional Dependencies (explain install):

XGBoost, LightGBM (gradient boosting models)
SHAP (TreeSHAP and KernelSHAP explanations)

Install options:

pip install -e .              # Base install (LogisticRegression only)
pip install -e ".[explain]"   # Adds SHAP, XGBoost, LightGBM
pip install -e ".[dev]"       # Development tools

Usage & configuration¶

How do I generate my first audit?¶

Follow the Quick Start guide for a 5-minute introduction
Use the German Credit example:

glassalpha audit \
  --config german_credit_simple.yaml \
  --output my_audit.pdf

For your own data:
See Using Custom Data for a complete tutorial
Use our configuration template custom_template.yaml (packaged with GlassAlpha) with detailed comments

audit_profile: tabular_compliance
reproducibility:
  random_seed: 42
data:
  path: your_data.csv
  target_column: outcome
model:
  type: xgboost

Run the audit:

glassalpha audit --config your_config.yaml --output audit.pdf

Want more examples? Browse our built-in datasets for automatic dataset fetching.

What file formats are supported for data?¶

GlassAlpha supports:

CSV (most common)
Parquet (recommended for large datasets)
Feather (fast binary format)
Pickle (Python objects)

The format is automatically detected from the file extension.

How do I handle missing data?¶

Configure preprocessing in your audit configuration:

preprocessing:
  handle_missing: true
  missing_strategy:
    median # For numeric: median, mean, mode
    # For categorical: mode, drop

GlassAlpha automatically handles most missing value scenarios.

Can I use my own pre-trained model?¶

Yes, specify the model path in your configuration:

model:
  type: xgboost
  path: models/my_trained_model.pkl

GlassAlpha supports models saved with:

Pickle (most scikit-learn models)
Joblib (scikit-learn and XGBoost)
Native formats (XGBoost .model, LightGBM .txt)

Model support¶

Which machine learning models are supported?¶

Currently Supported:

XGBoost - Gradient boosting with TreeSHAP explanations
LightGBM - Microsoft's gradient boosting framework
Logistic Regression - Linear classification models
Generic Scikit-learn - Most scikit-learn classifiers

Explanation Support:

TreeSHAP - Exact SHAP values for tree-based models (XGBoost, LightGBM)
KernelSHAP - Model-agnostic explanations for any model type

Need help choosing?

See the Model Selection Guide for performance benchmarks and use case recommendations
See the Explainer Deep Dive for guidance on explanation methods

How do I add support for a new model type?¶

Implement the ModelInterface protocol:

from glassalpha.models import load_model

# Add to models/__init__.py load_model() function
class MyModel:
    capabilities = {"supports_shap": True}
    version = "1.0.0"

    def predict(self, X):
        # Implementation
        pass

    def predict_proba(self, X):
        # Implementation
        pass

See the Trust & Deployment Guide for technical details.

Can I use deep learning models?¶

Deep learning models can be supported through the generic model interface, but:

TreeSHAP won't work (only for tree models)
Use KernelSHAP for model-agnostic explanations
Performance may be slower for explanation generation
Consider gradient-based explanations for better deep learning support

What about time series or text models?¶

GlassAlpha currently focuses on tabular data for classification tasks. Additional data modalities may be supported in potential future versions based on user demand and community contributions:

Time Series: Under consideration for potential future releases
Text/NLP: Under consideration for potential future releases
Computer Vision: Under consideration for potential future releases

Compliance & regulatory¶

Which regulations does GlassAlpha address?¶

Directly Supported:

GDPR (EU) - Right to explanation, automated decision-making
ECOA (US) - Fair lending, non-discrimination in credit (including reason codes for adverse actions)
FCRA (US) - Accuracy and fairness in credit reporting
EU AI Act - High-risk AI system requirements

Partially Supported:

Fair Housing Act (US) - Housing discrimination
Employment Standards (EEOC, various) - Hiring discrimination

How do I generate ECOA-compliant reason codes?¶

For credit decisions, ECOA requires providing specific reasons for adverse actions. GlassAlpha provides the reasons command:

glassalpha reasons \
  --model models/credit_model.pkl \
  --data data/test.csv \
  --instance 42 \
  --output notices/instance_42.txt

What you get:

Top-N negative feature contributions (ECOA typical: 4 reasons)
Automatic exclusion of protected attributes (age, gender, race)
ECOA-compliant adverse action notice template
Deterministic, reproducible output

Example output:

ADVERSE ACTION NOTICE
DECISION: DENIED

PRINCIPAL REASONS FOR ADVERSE ACTION:
1. Debt: Value of 5000 negatively impacted the decision
2. Duration: Value of 24 negatively impacted the decision
3. Credit History: Value of 2 negatively impacted the decision
4. Savings: Value of 1000 negatively impacted the decision

IMPORTANT RIGHTS UNDER FEDERAL LAW:
[ECOA disclosure text...]

See the Reason Codes Guide for complete documentation.

What's the difference between audit reports and reason codes?¶

Audit Reports (glassalpha audit):

Comprehensive PDF reports for model validation
Performance metrics, fairness analysis, feature importance
Used for internal compliance and regulatory submission
Analyzes the entire model's behavior

Reason Codes (glassalpha reasons):

Individual explanations for specific decisions
ECOA-compliant adverse action notices
Sent to applicants who were denied
Explains one instance at a time

Use audit reports for model governance and reason codes for applicant communication.

See the Trust & Deployment Guide for compliance framework information.

Can GlassAlpha reports be submitted to regulators?¶

Yes, GlassAlpha reports are designed for regulatory submission:

Professional formatting suitable for legal and regulatory review
Complete audit trails with reproducibility manifests
Statistical rigor with confidence intervals and significance testing
Standardized metrics aligned with regulatory expectations
Comprehensive documentation covering methodology and limitations

How does GlassAlpha handle protected attributes?¶

Protected attributes (race, gender, age) are used for:

Fairness Analysis - Bias detection across demographic groups
Statistical Testing - Demographic parity and equal opportunity
Report Generation - Group-specific performance metrics

Important: Protected attributes are used for analysis only, not model training (unless explicitly configured).

What audit evidence does GlassAlpha provide?¶

Generated Evidence:

PDF Audit Reports - Comprehensive analysis with visualizations
Audit Manifests - Complete execution metadata in JSON format
Configuration Records - All settings and parameters used
Individual Explanations - SHAP-based decision explanations
Statistical Analysis - Bias testing with confidence intervals
Reproducibility Data - Seeds, hashes, and version information

Performance & limitations¶

How fast is GlassAlpha?¶

Typical Performance:

Small datasets (< 1,000 rows): 1-3 seconds
Medium datasets (1,000-10,000 rows): 3-15 seconds
Large datasets (10,000-100,000 rows): 15-60 seconds

Performance Factors:

Model complexity (tree depth, number of estimators)
Explanation method (TreeSHAP is faster than KernelSHAP)
Number of features affects SHAP computation time
Hardware (CPU cores, memory) impacts parallel processing

What are the dataset size limits?¶

Practical Limits:

Rows: No hard limit, tested up to 1M+ rows
Features: Up to ~1,000 features (SHAP computation becomes slow beyond this)
Memory: Depends on available RAM (8GB recommended for 100K+ rows)

Optimization Options:

# For large datasets
explainers:
  config:
    treeshap:
      max_samples: 100 # Reduce from default 1000
    kernelshap:
      n_samples: 50 # Reduce from default 500

performance:
  low_memory_mode: true
  n_jobs: -1 # Use all CPU cores

Can I run GlassAlpha in production environments?¶

Yes, GlassAlpha is designed for production use:

Production Features:

Deterministic execution for consistent results
Configuration management with version control
Audit trails for compliance and debugging
Error handling with clear error messages
Security considerations (no external network calls)

Integration Options:

CLI automation for scheduled audits
Python API for programmatic integration
Configuration files for different environments
Enterprise deployment with RBAC and monitoring

For comprehensive production deployment guidance, see the Trust & Deployment Guide.

Are there any limitations I should know about?¶

Current Limitations:

Tabular data only - No text, image, or time series support yet
Classification focus - Limited regression support
English documentation - Additional languages may be supported based on demand
Single machine - No distributed computing support

Model Limitations:

TreeSHAP only works with tree-based models (XGBoost, LightGBM)
KernelSHAP can be slow for complex models or large datasets
Fairness metrics require protected attribute data

Enterprise features¶

What enterprise features are available?¶

GlassAlpha is organized to support potential future features for enterprise needs. If interested contact: enterprise@glassalpha.com

Integration & workflow¶

How do I integrate GlassAlpha with my existing ML pipeline?¶

Python API Integration:

Load configuration:

from glassalpha.pipeline import AuditPipeline
from glassalpha.config import AuditConfig

config = AuditConfig.from_yaml("audit_config.yaml")

Run audit:

pipeline = AuditPipeline(config)
results = pipeline.run()

Check results:

if results.success:
    print(f"Audit completed: {results.model_performance}")
else:
    print(f"Audit failed: {results.error_message}")

CLI Integration:

In CI/CD pipeline:

glassalpha validate --config production_config.yaml --strict
glassalpha audit --config production_config.yaml --output audit_report.pdf

Can I customize the audit reports?¶

Current Customization:

Report sections can be included/excluded
Color schemes and styling options
Company branding (logo, contact information)
Compliance statements for specific regulations

Configuration Example:

report:
  template: standard_audit
  styling:
    color_scheme: professional
    company_name: "Your Company"
    logo_path: "assets/logo.png"
  include_sections:
    - executive_summary
    - model_performance
    - fairness_analysis

How do I handle multiple models or environments?¶

Multiple Configurations:

# Development environment
glassalpha audit --config german_credit_simple.yaml --output dev_audit.pdf

# Production environment (create your own config or use packaged templates)
glassalpha audit --config gdpr_compliance.yaml --output prod_audit.pdf --strict

Configuration Overrides:

# Base configuration with environment-specific overrides
glassalpha audit \
  --config base_config.yaml \
  --override prod_overrides.yaml \
  --output prod_audit.pdf

Development & extension¶

How do I contribute to GlassAlpha?¶

Ways to Contribute:

Bug reports and feature requests via GitHub Issues
Code contributions following our development guidelines
Documentation improvements and examples
Testing on different platforms and use cases

See the Contributing Guide for detailed instructions.

Can I build custom metrics?¶

Yes, implement the MetricInterface:

# Custom metrics are now implemented by extending the metrics module directly:

class MyCustomMetric:
    def compute(self, y_true, y_pred, **kwargs):
        # Custom calculation
        return {"value": result, "interpretation": "higher_is_better"}

# Register in your metrics/__init__.py or use directly in pipeline

How do I add custom explainers?¶

Implement the ExplainerInterface:

# Custom explainers are now implemented by extending the explain module directly:

class MyExplainer:
    def explain(self, model, X, y=None):
        # Custom explanation logic
        return {"explanations": explanations}

    def supports_model(self, model):
        return model.get_model_type() in ["my_model_type"]

Troubleshooting¶

Common installation issues?¶

Python Version:

Ensure Python 3.11+ (check with python --version)
Use virtual environments to avoid conflicts

XGBoost on macOS:

# If you see libomp errors
brew install libomp
pip uninstall xgboost && pip install xgboost

Memory Issues:

Reduce max_samples in explainer configuration
Enable low_memory_mode in performance settings
Use smaller datasets for initial testing

Where can I get help?¶

Support Channels:

Documentation: Comprehensive guides and examples
GitHub Issues: Bug reports and feature requests
Community Discussions: User questions and sharing
Troubleshooting Guide: Common issues and solutions

For immediate help, check the Troubleshooting Guide or search existing GitHub Issues.

Getting started¶

What's the fastest way to start using GlassAlpha?¶

Follow the Quick Start Guide - Get running in under 10 minutes
Try the German Credit example - See all features working
Adapt the configuration - Modify for your specific use case
Read the documentation - Understand advanced features

Essential Resources:

Quick Start Guide - 5-minute introduction
Using Custom Data - Audit your own models
Built-in Datasets - Automatic dataset fetching
Configuration Guide - Complete configuration reference
Model Selection Guide - Choose the right model
Explainer Deep Dive - Understanding explanations
German Credit Tutorial - Complete walkthrough
CLI Reference - Command-line interface

I'm new to ML auditing. Where should I start?¶

Learning Path:

Understand the basics - What is algorithmic bias and fairness?
Review regulations - What compliance frameworks apply to your use case?
Try GlassAlpha - Generate an audit report with sample data
Interpret results - Learn to read audit reports and metrics
Plan implementation - Design your audit workflow and processes

Recommended Reading:

Trust & Deployment Guide - Architecture, licensing, security, and compliance
German Credit Tutorial (detailed interpretation)
Industry guides on algorithmic fairness and bias

This FAQ covers the most common questions about GlassAlpha. If you don't find your answer here, please check our other documentation or reach out via GitHub Issues.