Configuration API¶
The config module provides centralized configuration management for the RenalProg pipeline, including paths, hyperparameters, and project-wide constants.
Overview¶
The configuration module is organized into several components:
- Path Management: Centralized directory structure for data, models, and outputs
- Preprocessing Configuration: Parameters for data preprocessing
- VAE Configuration: Hyperparameters for VAE model training
- Trajectory Configuration: Settings for trajectory generation
- Classification Configuration: Parameters for survival classification
- Enrichment Configuration: Settings for pathway enrichment analysis
Path Structure¶
The module defines a comprehensive directory structure:
from renalprog.config import PATHS
# Access common paths
data_dir = PATHS['data']
models_dir = PATHS['models']
figures_dir = PATHS['figures']
Available Paths¶
| Key | Description |
|---|---|
root | Project root directory |
data | Main data directory |
raw | Raw, immutable data |
interim | Intermediate processed data |
processed | Final, canonical data sets |
external | External data sources (pathways, gene lists) |
models | Trained models and checkpoints |
reports | Analysis reports |
figures | Generated figures and plots |
notebooks | Jupyter notebooks |
references | Reference materials |
scripts | Pipeline scripts |
Configuration Classes¶
PreprocessingConfig¶
Configuration for data preprocessing steps.
PreprocessingConfig ¶
Parameters for data preprocessing.
Example Usage:
from renalprog.config import PreprocessingConfig
# Access default preprocessing parameters
config = PreprocessingConfig()
print(f"Mean threshold: {config.mean_threshold}")
print(f"Outlier alpha: {config.outlier_alpha}")
print(f"Test split ratio: {config.test_size}")
VAEConfig¶
Configuration for VAE model architecture and training.
VAEConfig ¶
Default hyperparameters for VAE training.
Example Usage:
from renalprog.config import VAEConfig
# Use default VAE configuration
config = VAEConfig()
# Or customize for your experiment
custom_config = VAEConfig(
mid_dim=256,
latent_dim=10,
learning_rate=0.0001,
num_epochs=500,
batch_size=64
)
TrajectoryConfig¶
Configuration for trajectory generation from trained VAE models.
TrajectoryConfig ¶
Parameters for synthetic trajectory generation.
Example Usage:
from renalprog.config import TrajectoryConfig
config = TrajectoryConfig(
num_trajectories=5000,
trajectory_length=100,
noise_scale=0.1
)
ClassificationConfig¶
Configuration for survival classification models.
ClassificationConfig ¶
Parameters for stage classification.
EnrichmentConfig¶
Configuration for pathway enrichment analysis.
EnrichmentConfig ¶
Parameters for pathway enrichment analysis.
Utility Functions¶
get_dated_dir¶
Create a dated directory for organizing time-stamped outputs.
get_dated_dir ¶
Create a dated directory for organizing outputs.
Args: base_dir: Base directory path prefix: Optional prefix for the directory name
Returns: Path to the dated directory
Source code in renalprog/config.py
Example:
from renalprog.config import get_dated_dir, PATHS
# Create a dated directory for today's model outputs
model_dir = get_dated_dir(PATHS['models'], prefix='VAE_KIRC')
# Returns: models/20251218_VAE_KIRC/
KIRC-Specific Paths¶
The module provides a KIRCPaths class for managing KIRC dataset-specific file locations:
from renalprog.config import KIRCPaths
# Access KIRC-specific data paths
rnaseq = KIRCPaths.RNASEQ_RAW
clinical = KIRCPaths.CLINICAL_RAW
pathways = KIRCPaths.REACTOME_PATHWAYS
Best Practices¶
Using Configuration in Scripts¶
Always import configuration at the module level:
from renalprog.config import PATHS, VAEConfig, PreprocessingConfig
def main():
# Access paths
data_dir = PATHS['interim']
# Use configuration objects
vae_config = VAEConfig()
preproc_config = PreprocessingConfig()
Creating Custom Configurations¶
For experiments, create configuration variants:
from renalprog.config import VAEConfig
# Base configuration
base_config = VAEConfig()
# Experiment variations
configs = {
'small': VAEConfig(mid_dim=128, latent_dim=5),
'medium': VAEConfig(mid_dim=256, latent_dim=10),
'large': VAEConfig(mid_dim=512, latent_dim=20)
}
Saving Configuration¶
Always save configuration with trained models:
from renalprog.modeling.checkpointing import save_model_config
config = VAEConfig(experiment_name='my_experiment')
save_model_config(config, output_dir)
See Also¶
- Dataset API - Data loading and preprocessing
- Training API - Model training with configuration