Configuration Reference¶
All training parameters are controlled via YAML config files in configs/. The canonical reference is configs/config.yaml.
Config Structure¶
root: "./opf_data/" # Dataset root directory
logging_dir: "./logs/" # Log output directory
checkpoint_dir: "./ddp_checkpoints/" # Checkpoint save directory
Optimizer¶
optimizer:
AdamW:
lr: 1.0e-03 # Learning rate
betas: [0.9, 0.999] # Adam beta parameters
eps: 1.0e-08 # Numerical stability
weight_decay: 0.01 # L2 regularization
Scheduler¶
scheduler:
type: "cosine" # Options: "cosine", "step", "plateau"
t_max: null # Max iterations for cosine (null = max_epochs)
eta_min: 0.0 # Minimum learning rate
Data¶
train_split: 0.8 # Fraction of data for training
val_split: 0.1 # Fraction for validation (rest is test)
loader:
batch_size: 16
shuffle: true
num_workers: 0 # DataLoader workers (0 = main process)
Training¶
training:
max_epochs: 10
patience: 100000 # Early stopping patience
max_global_samples: 500000 # Stop after this many samples
global_batch_size: 8192 # Effective batch size across GPUs
gradient_clip_val: 1.0 # Gradient norm clipping
gradient_clip_algorithm: "norm"
fail_on_nonfinite: false # Raise on NaN loss (false = skip)
accumulate_grad_batches: 1 # Gradient accumulation steps
log_every_n_steps: 0 # W&B log interval (0 = every step)
log_every_n_samples: 8192 # Log every N samples
val_every_n_epochs: 0 # Validate every N epochs (0 = sample-based)
val_every_n_samples: 40960 # Validate every N samples
throughput_enabled: false # Enable throughput measurement
Checkpointing¶
checkpointing:
monitor: "val/score" # Metric to monitor for best model
save_last: true # Always save last checkpoint
run_scoped: true # Scope checkpoints by run name
every_n_epochs: 0 # Save every N epochs (0 = disabled)
every_n_samples: 0 # Save every N samples (0 = disabled)
Model Configuration¶
Model configs are in configs/model/. Each model type has its own section. The snippet below is a tutorial-friendly override; the shipped configs/model/heterognn.yaml defaults to a larger production-scale model (hidden_channels: 2048, num_layers: 8, backend: "gat"):