Troubleshooting guide¶
Common issues, error messages, and solutions for GlassAlpha. This guide helps diagnose and resolve problems quickly.
Quick diagnostics¶
If you're experiencing issues, start with these diagnostic commands:
# Verify installation (if package is installed)
glassalpha --version
# Alternative: Use module invocation (development/uninstalled)
PYTHONPATH=src python3 -m glassalpha --version
# Check component availability
glassalpha list
# Alternative: PYTHONPATH=src python3 -m glassalpha list
# Validate configuration
glassalpha validate --config german_credit_simple.yaml
# Alternative: PYTHONPATH=src python3 -m glassalpha validate --config german_credit_simple.yaml
# Test with minimal configuration
glassalpha audit --config german_credit_simple.yaml --output test.pdf --dry-run
# Alternative: PYTHONPATH=src python3 -m glassalpha audit --config german_credit_simple.yaml --output test.pdf --dry-run
CLI command issues¶
glassalpha: command not found¶
This indicates GlassAlpha isn't properly installed or the CLI entry point isn't available.
Solutions:
- Install the package (recommended for users):
- Use module invocation (development/troubleshooting):
cd glassalpha
PYTHONPATH=src python3 -m glassalpha --version
PYTHONPATH=src python3 -m glassalpha audit --config german_credit_simple.yaml --output test.pdf --dry-run
- Check your environment:
# Check if package is installed
pip list | grep glassalpha
# Check Python path
python3 -c "import sys; print('\n'.join(sys.path))"
Installation issues¶
Python version compatibility¶
Error:
Solution:
# Check Python version
python --version
# Upgrade Python (macOS with Homebrew)
brew install python@3.11
python3.11 -m pip install -e .
# Upgrade Python (Linux)
sudo apt update && sudo apt install python3.11
# Create virtual environment with correct Python
python3.11 -m venv venv
source venv/bin/activate
pip install -e .
XGBoost installation issues (macOS)¶
Error:
Solution:
# Install OpenMP library
brew install libomp
# Reinstall XGBoost
pip uninstall xgboost
pip install xgboost
# Verify installation
python -c "import xgboost; print('XGBoost version:', xgboost.__version__)"
Dependency conflicts¶
Error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed
Solution:
# Create clean virtual environment
python -m venv clean_env
source clean_env/bin/activate # On Windows: clean_env\Scripts\activate
# Upgrade pip
pip install --upgrade pip
# Install GlassAlpha
pip install -e .
# If conflicts persist, install dependencies individually
pip install pandas numpy scikit-learn
pip install xgboost lightgbm
pip install shap matplotlib seaborn
pip install -e . --no-deps
Missing system dependencies¶
Error (Linux):
Solution (Ubuntu/Debian):
Solution (CentOS/RHEL):
Configuration errors¶
Missing required fields¶
Error:
ValidationError: 1 validation error for AuditConfig
data.target_column
field required (type=value_error.missing)
Solution:
# Add missing required fields
data:
path: data/dataset.csv
target_column: outcome # This was missing
model:
type: xgboost # Ensure model type is specified
Invalid file paths¶
Error:
Solutions:
# Use absolute paths
glassalpha audit --config /full/path/to/config.yaml --output report.pdf
# Config files are packaged with GlassAlpha - no need for configs/ directory
# Just reference by name:
glassalpha audit --config german_credit_simple.yaml --output audit.pdf
Model type not found¶
Error:
Solution:
# Check available components (models, explainers, metrics)
glassalpha list
# Use correct model type
# Valid options: xgboost, lightgbm, logistic_regression, sklearn_generic
Schema validation errors¶
Error:
ValidationError: 2 validation errors for TabularDataSchema
features
ensure this value has at least 1 characters
target
field required
Solution:
data:
path: data/dataset.csv
target_column: target # Must specify target
feature_columns: # Either specify features explicitly
- feature1
- feature2
# OR let GlassAlpha auto-detect (remove feature_columns)
Model and dependency issues¶
Model not available errors¶
Error messages:
Model 'xgboost' not available. Falling back to 'logistic_regression'.
To enable 'xgboost', run: pip install 'glassalpha[xgboost]'
Or:
Cause: GlassAlpha uses optional dependencies to keep the core installation lightweight. XGBoost and LightGBM are not installed by default.
Solutions:
- Use the baseline model (recommended for getting started):
- Install the features you need:
# For SHAP + tree models (includes XGBoost and LightGBM)
pip install 'glassalpha[explain]'
# For all features
pip install 'glassalpha[all]'
- Check what's currently available:
- Disable fallbacks for strict requirements:
Model loading failures¶
Error messages:
Cause: The model library isn't installed in your environment.
Solutions: Same as above - install the required optional dependency.
Fallback model not suitable¶
Error messages:
Cause: The requested model isn't available, and fallback is enabled.
Solutions:
- Accept the fallback (LogisticRegression is a solid baseline)
- Install the requested model using the provided installation command
- Disable fallbacks if you need the specific model:
Explainer errors (SHAP library required)¶
Error messages:
Or:
No explainer from ['treeshap'] is available. Missing dependencies: ['shap'].
Install with: pip install 'glassalpha[explain]'
OR use zero-dependency explainers: ['coefficients', 'permutation']
Cause: Your configuration requests a SHAP-based explainer (TreeSHAP or KernelSHAP) but the SHAP library isn't installed in your environment.
Solutions:
- Use zero-dependency explainers (recommended for getting started):
# In your config.yaml
explainers:
priority:
- coefficients # Best for LogisticRegression (no dependencies)
- permutation # Works with any model (no dependencies)
- Install SHAP for tree model explanations:
# Install SHAP + XGBoost + LightGBM
pip install 'glassalpha[explain]'
# Verify installation
glassalpha doctor
- Use the quickstart config which works without SHAP:
Which explainer should I use?
| Model Type | Best Explainer | Requires SHAP? | Notes |
|---|---|---|---|
| LogisticRegression | coefficients |
❌ No | Zero dependencies, fast |
| XGBoost/LightGBM | treeshap |
✅ Yes | Most accurate for tree models |
| RandomForest | treeshap |
✅ Yes | Handles tree ensembles well |
| Any model | permutation |
❌ No | Universal fallback, model-agnostic |
Example configs:
For LogisticRegression (works with base install):
For XGBoost (requires pip install 'glassalpha[explain]'):
Data issues¶
Data format mismatch in reasons/recourse commands¶
Error:
Error: Test data missing 15 columns.
Expected 22 columns: age_group, checking_account_status, ...
Actual 67 columns: age_group_middle_aged, age_group_middle_young, ...
Cause:
The reasons and recourse commands expect data in the same format the model was trained with (including preprocessing/one-hot encoding). If you provide the original CSV data, it won't match the model's expected format.
Solution:
Always use the preprocessed test data saved during audit generation:
# ✅ CORRECT - Use preprocessed test data
glassalpha reasons \
--model models/german_credit.pkl \
--data models/test_data.csv \
--instance 0
# ❌ WRONG - Original data needs preprocessing
glassalpha reasons \
--model models/german_credit.pkl \
--data data/german_credit.csv \
--instance 0
Why this happens:
- During audit generation, GlassAlpha applies preprocessing (one-hot encoding categorical features)
- The model is trained on this preprocessed data
- The preprocessed test data is saved to
models/test_data.csv - Reason codes/recourse need the same preprocessed format
Automatic detection (v0.2.0+):
GlassAlpha now automatically detects if data is already preprocessed and skips double preprocessing. You'll see:
If you see "Applying preprocessing to match model training..." but still get errors, use the test_data.csv file instead.
Dataset not found¶
Error:
Solutions:
- Check file paths (most common issue):
# Check file existence
ls -la data/missing.csv
# Check current directory
pwd
# Use absolute paths in config
data:
path: /Users/yourname/data/dataset.csv # Absolute path
# OR
path: ~/data/dataset.csv # Home directory expansion
Common Mistake: Relative Paths
Relative paths are resolved from where you run the command, not where the config file is located. Use absolute paths or ~ for home directory.
- Use built-in datasets for testing:
- Browse available datasets:
Need help with data preparation? See Using Custom Data for a complete tutorial.
Column not found¶
Error:
Causes:
- Column name misspelled or incorrect case
- CSV file has no headers
- Column name has spaces or special characters
Solutions:
- Inspect your data:
# Check column names
python -c "import pandas as pd; df = pd.read_csv('data/dataset.csv'); print('Columns:', df.columns.tolist()); print('Shape:', df.shape)"
# Check first few rows
python -c "import pandas as pd; print(pd.read_csv('data/dataset.csv').head())"
- Update configuration with exact column name:
data:
path: data/dataset.csv
target_column: "actual_column_name" # Use quotes if spaces/special chars
# Check for common issues:
# - Extra spaces: "target " vs "target"
# - Case sensitivity: "Target" vs "target"
# - Special characters: "target_column" vs "target-column"
- Fix CSV headers:
import pandas as pd
# Read CSV without headers
df = pd.read_csv('data/dataset.csv', header=None)
# Add column names
df.columns = ['feature1', 'feature2', 'target']
# Save with headers
df.to_csv('data/dataset_with_headers.csv', index=False)
Protected attributes not found¶
Error:
Solutions:
- Check attribute names:
# List all columns
python -c "import pandas as pd; print(pd.read_csv('data/dataset.csv').columns.tolist())"
- Update config with correct names:
- If attributes are missing from your data:
data:
protected_attributes: [] # Empty list if no protected attributes available
# Note: Fairness metrics will be limited without protected attributes
For more on protected attributes: See Using Custom Data.
Data format issues¶
Error:
Causes:
- Inconsistent number of columns
- Unescaped commas in text fields
- Corrupted file
Solutions:
- Check for delimiter issues:
import pandas as pd
# Try different delimiters
df = pd.read_csv('data/dataset.csv', delimiter='\t') # Tab-separated
df = pd.read_csv('data/dataset.csv', delimiter=';') # Semicolon-separated
- Handle problematic CSV files:
# Read with error handling
df = pd.read_csv('data/dataset.csv',
on_bad_lines='skip', # Skip bad lines
encoding='utf-8') # Specify encoding
- Convert to a supported format:
import pandas as pd
# Read problematic CSV
df = pd.read_csv('data/dataset.csv', on_bad_lines='skip')
# Save as Parquet (more robust)
df.to_parquet('data/dataset.parquet', index=False)
Data type issues¶
Error:
Causes:
- Datetime columns not converted
- Complex object types
- Mixed data types in columns
Solutions:
- Convert datetime columns:
import pandas as pd
df = pd.read_csv('data/dataset.csv')
# Option 1: Convert to Unix timestamp
df['date_column'] = pd.to_datetime(df['date_column']).astype('int64') // 10**9
# Option 2: Extract useful features
df['year'] = pd.to_datetime(df['date_column']).dt.year
df['month'] = pd.to_datetime(df['date_column']).dt.month
df['day_of_week'] = pd.to_datetime(df['date_column']).dt.dayofweek
# Drop original datetime column
df = df.drop('date_column', axis=1)
df.to_csv('data/processed_dataset.csv', index=False)
- Handle categorical data:
# Check data types
print(df.dtypes)
# Convert object columns to category or numeric
df['category_column'] = df['category_column'].astype('category').cat.codes
- Identify problematic columns:
# Find columns with mixed types
for col in df.columns:
unique_types = df[col].apply(type).unique()
if len(unique_types) > 1:
print(f"Column '{col}' has mixed types: {unique_types}")
Missing values¶
Error:
Causes:
- Missing data in dataset
- Infinite values from calculations
- Very large numbers causing overflow
Solutions:
- Inspect missing data:
import pandas as pd
df = pd.read_csv('data/dataset.csv')
# Check missing values per column
print(df.isnull().sum())
# Check percentage missing
print(df.isnull().sum() / len(df) * 100)
# Identify rows with missing target
print(df[df['target'].isnull()])
- Handle missing values:
# Option 1: Drop rows with missing target (recommended)
df = df.dropna(subset=['target'])
# Option 2: Drop columns with >30% missing
threshold = 0.3
df = df.loc[:, df.isnull().mean() < threshold]
# Option 3: Impute missing values
from sklearn.impute import SimpleImputer
# For numeric columns
numeric_cols = df.select_dtypes(include=['float64', 'int64']).columns
imputer = SimpleImputer(strategy='median')
df[numeric_cols] = imputer.fit_transform(df[numeric_cols])
# For categorical columns
categorical_cols = df.select_dtypes(include=['object']).columns
df[categorical_cols] = df[categorical_cols].fillna(df[categorical_cols].mode().iloc[0])
df.to_csv('data/cleaned_dataset.csv', index=False)
- Check for infinite values:
import numpy as np
# Replace infinite values
df = df.replace([np.inf, -np.inf], np.nan)
# Then handle as missing values
df = df.dropna()
More data preparation guidance: See Using Custom Data - Data Cleaning.
Target column issues¶
Error:
Causes:
- Target has more than 2 unique values
- Target is text instead of numeric
- Target has missing values
Solutions:
- Check target distribution:
import pandas as pd
df = pd.read_csv('data/dataset.csv')
print("Target values:", df['target'].unique())
print("Target counts:", df['target'].value_counts())
print("Target type:", df['target'].dtype)
- Convert target to binary:
# If target is text (e.g., "Yes"/"No")
df['target'] = (df['target'] == 'Yes').astype(int)
# If target is numeric but not 0/1
df['target'] = (df['target'] > threshold).astype(int)
# If target is multi-class, convert to binary
# Example: predict if class is "high risk" vs others
df['target'] = (df['target'] == 'high_risk').astype(int)
df.to_csv('data/dataset_binary.csv', index=False)
- Remove missing target values:
Insufficient data¶
Error:
Causes:
- Dataset too small
- Too many missing values removed
- Heavy class imbalance with very few positive samples
Solutions:
- Check dataset size:
import pandas as pd
df = pd.read_csv('data/dataset.csv')
print(f"Total rows: {len(df)}")
print(f"Class distribution:\n{df['target'].value_counts()}")
print(f"Minimum class size: {df['target'].value_counts().min()}")
-
Use a larger dataset:
-
Browse built-in datasets
- Combine multiple datasets if possible
-
Use German Credit (1,000 rows) for testing
-
If stuck with small data:
# Adjust model parameters for small datasets
model:
type: logistic_regression # Better for small data than tree models
params:
max_iter: 1000
C: 1.0
# Reduce explainer sample sizes
explainers:
config:
treeshap:
max_samples: 50 # Reduce to match data size
Class imbalance¶
Warning:
Solutions:
- Check imbalance ratio:
import pandas as pd
df = pd.read_csv('data/dataset.csv')
value_counts = df['target'].value_counts()
imbalance_ratio = value_counts.max() / value_counts.min()
print(f"Imbalance ratio: {imbalance_ratio:.1f}:1")
print(f"Class distribution:\n{value_counts}")
- Adjust model for imbalance:
model:
type: xgboost
params:
# Critical for imbalanced data
scale_pos_weight: 99 # Ratio of negative to positive samples
objective: binary:logistic
- Use appropriate metrics:
metrics:
performance:
metrics:
- precision # More important than accuracy for imbalanced data
- recall
- f1
- pr_auc # Better than roc_auc for imbalanced data
Example configurations: See credit_card_fraud_template.yaml (packaged with GlassAlpha) for handling 99.8% class imbalance.
Runtime errors¶
Memory issues¶
Error:
Solutions:
# Reduce sample sizes for explainers
explainers:
config:
treeshap:
max_samples: 100 # Reduce from default 1000
kernelshap:
n_samples: 50 # Reduce from default 500
# Enable low memory mode
performance:
low_memory_mode: true
n_jobs: 1 # Reduce parallelism
Model training failures¶
Error:
Solution:
# Check data consistency
import pandas as pd
df = pd.read_csv('data/dataset.csv')
print(f"Rows: {len(df)}")
print(f"Target values: {df['target'].value_counts()}")
print(f"Missing values: {df.isnull().sum()}")
# Remove rows with missing target values
df = df.dropna(subset=['target'])
df.to_csv('data/cleaned_dataset.csv', index=False)
SHAP computation errors¶
Error:
Solution:
# Use compatible explainer
explainers:
priority:
- kernelshap # Model-agnostic alternative
config:
kernelshap:
n_samples: 100
PDF generation issues¶
Error:
Solution (Linux):
Solution (macOS):
Solution (Alternative):
Performance issues¶
Slow execution¶
Problem: Audit takes longer than expected
Solutions:
- Reduce Sample Sizes:
explainers:
config:
treeshap:
max_samples: 100 # Default: 1000
kernelshap:
n_samples: 50 # Default: 500
background_size: 50 # Default: 100
- Enable Parallel Processing:
- Use Simpler Models:
- Skip Expensive Operations:
# Use dry run for config validation
glassalpha audit --config config.yaml --output test.pdf --dry-run
# Test with smaller datasets first
head -n 500 large_dataset.csv > test_dataset.csv
High memory usage¶
Problem: System runs out of memory during audit
Solutions:
- Enable Memory Optimization:
- Process Data in Chunks:
# Pre-process large datasets
import pandas as pd
df = pd.read_csv('large_dataset.csv')
sample = df.sample(n=1000, random_state=42) # Use sample for testing
sample.to_csv('sample_dataset.csv', index=False)
- Optimize Explainer Settings:
explainers:
config:
treeshap:
max_samples: 50 # Very small for memory constraints
kernelshap:
n_samples: 20
background_size: 20
Large PDF files¶
Problem: Generated PDFs are too large
Solutions:
- Optimize Report Configuration:
- Reduce Plot Resolution:
- Exclude Optional Sections:
report:
include_sections:
- executive_summary
- model_performance
# Remove: local_explanations, detailed_plots
Debugging and diagnostics¶
Enable verbose logging¶
# Enable detailed logging
glassalpha --verbose audit --config config.yaml --output audit.pdf
# Check specific component status
export GLASSALPHA_LOG_LEVEL=DEBUG
glassalpha audit --config config.yaml --output audit.pdf
Explainer/model compatibility debugging¶
# Debug explainer availability
python -c "
from glassalpha.explain import select_explainer
# Test each model type
for model_type in ['xgboost', 'lightgbm', 'sklearn']:
try:
explainer = select_explainer(model_type)
print(f'{model_type}: {explainer.__class__.__name__}')
except Exception as e:
print(f'{model_type}: ERROR - {e}')
"
Configuration debugging¶
# Debug configuration loading
python -c "
from glassalpha.config import load_config_from_file
config = load_config_from_file('your_config.yaml')
print('Loaded config:', config.model_dump())
"
Data loading debugging¶
# Debug data issues
python -c "
from glassalpha.data import TabularDataLoader
loader = TabularDataLoader()
data = loader.load('data/dataset.csv')
print('Shape:', data.shape)
print('Columns:', data.columns.tolist())
print('Dtypes:', data.dtypes.to_dict())
print('Missing:', data.isnull().sum().to_dict())
"
Getting help¶
Before requesting support¶
- Check this troubleshooting guide
- Run diagnostic commands shown above
- Try with the German Credit example (known working configuration)
- Enable verbose logging for detailed error information
Information to include in support requests¶
# System information
glassalpha --version
python --version
pip list | grep -E "(glassalpha|xgboost|lightgbm|shap|pandas)"
# Configuration (remove sensitive data)
cat your_config.yaml
# Complete error message
glassalpha --verbose audit --config config.yaml --output audit.pdf 2>&1
Support channels¶
- GitHub Issues: https://github.com/GlassAlpha/glassalpha/issues
- Documentation: Complete guides
- Community: GitHub Discussions
Prevention best practices¶
Environment setup¶
# Always use virtual environments
python -m venv glassalpha-env
source glassalpha-env/bin/activate
# Keep dependencies updated
pip install --upgrade pip
pip install --upgrade glassalpha
Configuration management¶
# Always specify explicit seeds for reproducibility
reproducibility:
random_seed: 42
# Use version control for configurations
# Git commit: "Update audit configuration for production"
# Validate configurations before use
Testing strategy¶
# Test with small datasets first
glassalpha audit --config config.yaml --output test.pdf --dry-run
# Test new configurations incrementally
# Start with minimal config, add features gradually
Monitoring and logging¶
# Enable logging for production
export GLASSALPHA_LOG_LEVEL=INFO
# Log all audit runs
glassalpha audit --config config.yaml --output audit.pdf 2>&1 | tee audit.log
This troubleshooting guide covers the most common issues encountered with GlassAlpha. If you encounter an issue not covered here, please check the GitHub issues or contact support with the diagnostic information requested above.