Hyperparameter Tuning¶
The hyperparameter tuning module integrates Ray Tune with the deterministic four-parameter emulator training workflow. It searches over model architecture, optimizer, learning-rate, normalization, and training settings for FourParamEmulator, then returns Ray Tune results that can be inspected to choose a final configuration.
What This Module Does¶
- Provides a default Ray Tune search space for
FourParamEmulator - Trains one trial at a time with train/validation arrays
- Fits optional normalizers using training arrays only
- Reports validation loss, RMSE, relative error, and best-trial metadata
- Saves checkpoints when a trial reaches a new best validation loss
- Launches a full Ray Tune search and returns the resulting
ResultGrid
This module specifically handles this step in the workflow: Tune Hyperparameters.
The current tuning helper is scoped to FourParamEmulator. MCDropoutEmulator is part of the stable model and training API, but it is trained with fit(..., evaluation="evaluate_mc_metrics") rather than run_tune_four_param(...).
When To Use It¶
Use tuning when you want to compare many model and optimizer settings systematically. It is useful after the preprocessing pipeline is stable and you are ready to improve the baseline emulator.
Tuning is optional. If you are testing the package, debugging data preparation, or training a quick baseline, start with FourParamEmulator and fit(...) instead. If you need dropout-based predictive-spread estimates, use MCDropoutEmulator with the regular training API. Ray Tune runs many training trials and can take significantly longer than a single training run.
What You Need Before Tuning¶
Before calling the tuning helpers, you need:
- Training arrays
X_trainandY_train. - Validation arrays
X_valandY_val. - Arrays with shapes compatible with
FourParamEmulator: by defaultX.shape[1] == 4andY.shape[1] == 5. - Ray Tune installed. The package declares
ray[tune]>=2.0as a dependency. - Enough local disk space for Ray Tune trial artifacts and checkpoints.
You can get the arrays from a condensed HDF5 file with load_training_arrays(...).
Typical Workflow¶
from pathlib import Path
from reionemu import load_training_arrays, run_tune_four_param
h5_path = Path("path/to/condensed.h5")
X, Y, ell = load_training_arrays(h5_path)
split_idx = int(0.8 * len(X))
X_train, X_val = X[:split_idx], X[split_idx:]
Y_train, Y_val = Y[:split_idx], Y[split_idx:]
results = run_tune_four_param(
X_train=X_train,
Y_train=Y_train,
X_val=X_val,
Y_val=Y_val,
num_samples=20,
max_concurrent_trials=2,
device="cpu",
storage_path="ray_results",
experiment_name="four_param_search",
)
best = results.get_best_result(metric="val_loss", mode="min")
print(best.config)
print(best.metrics["best_val_loss"])
default_param_space¶
default_param_space returns the default Ray Tune search space for the deterministic four-parameter emulator.
Main Entry Point¶
def default_param_space() -> dict:
Default Search Space¶
{
"hidden_dim": tune.choice([16, 32, 64, 128, 256]),
"num_hidden_layers": tune.choice([1, 2, 3, 4]),
"activation": tune.choice(["relu", "gelu", "silu", "tanh"]),
"optimizer": tune.choice(["adam", "adamw"]),
"lr": tune.loguniform(1e-4, 5e-3),
"weight_decay": tune.loguniform(1e-8, 1e-4),
"batch_size": tune.choice([16, 32, 64]),
"epochs": 200,
"early_stopping_patience": 20,
"gradient_clipping": tune.choice([None, 1.0, 5.0]),
"normalize_X": True,
"normalize_Y": False,
"device": "auto",
}
Typical Usage¶
from ray import tune
from reionemu import default_param_space, run_tune_four_param
param_space = default_param_space()
param_space["hidden_dim"] = tune.choice([32, 64])
param_space["epochs"] = 100
results = run_tune_four_param(
X_train=X_train,
Y_train=Y_train,
X_val=X_val,
Y_val=Y_val,
param_space=param_space,
num_samples=10,
)
resolve_device¶
resolve_device converts a device string into a torch.device.
Main Entry Point¶
def resolve_device(device: str = "auto") -> torch.device:
If device is not "auto", the function returns torch.device(device). If device="auto", it chooses:
"cuda"when CUDA is available."mps"when Apple Silicon MPS is available."cpu"otherwise.
This helper is used inside tuning trials, but it can also be useful in scripts.
train_four_param_tune¶
train_four_param_tune is the Ray Tune trainable for one trial. Most users do not call it directly; run_tune_four_param(...) wraps it with resources, parameters, scheduling, and result handling.
Main Entry Point¶
def train_four_param_tune(
config: dict,
*,
X_train: np.ndarray,
Y_train: np.ndarray,
X_val: np.ndarray,
Y_val: np.ndarray,
) -> None:
What Happens In One Trial¶
For each trial, the trainable:
- Resolves the requested device.
- Optionally fits
XandYnormalizers onX_trainandY_train. - Transforms train and validation arrays using training-only statistics.
- Builds train and validation
DataLoaderobjects. - Builds a
FourParamEmulatorusing the trial config. - Builds an Adam or AdamW optimizer.
- Trains for up to
config["epochs"]epochs. - Reports
train_loss,val_loss,val_rmse,val_relative_error,best_val_loss, andbest_epoch. - Saves a checkpoint whenever validation loss improves.
- Stops early if
early_stopping_patienceis set and validation loss does not improve for that many epochs.
Trial Checkpoints¶
When a trial reaches a new best validation loss, the checkpoint contains:
model.pt: The model state dictionary.metadata.pt: The trial config, best validation loss, best epoch, and fitted normalizers.
These checkpoints are managed by Ray Tune and are usually accessed through the best result object returned by run_tune_four_param(...).
run_tune_four_param¶
run_tune_four_param is the main user-facing tuning entry point. It launches Ray Tune with an ASHA scheduler and returns a Ray Tune ResultGrid.
Main Entry Point¶
def run_tune_four_param(
*,
X_train: np.ndarray,
Y_train: np.ndarray,
X_val: np.ndarray,
Y_val: np.ndarray,
param_space: dict | None = None,
num_samples: int = 40,
max_concurrent_trials: int = 4,
device: str = "auto",
storage_path: str | None = None,
experiment_name: str = "train_four_param_tune",
):
| Parameter | Type | Default | Description |
|---|---|---|---|
| X_train | np.ndarray |
Required | Training input array |
| Y_train | np.ndarray |
Required | Training target array |
| X_val | np.ndarray |
Required | Validation input array |
| Y_val | np.ndarray |
Required | Validation target array |
| param_space | dict | None |
None |
Ray Tune search space; defaults to default_param_space() |
| num_samples | int |
40 |
Number of hyperparameter configurations to sample |
| max_concurrent_trials | int |
4 |
Maximum number of trials running at once |
| device | str |
"auto" |
Device used by each trial |
| storage_path | str | None |
None |
Ray Tune output directory |
| experiment_name | str |
"train_four_param_tune" |
Name for the Ray Tune experiment |
Scheduler And Resources¶
The function uses ASHAScheduler with:
max_tequal toparam_space["epochs"].grace_periodequal tomin(15, max_t).reduction_factor=2.
Each trial requests:
2CPUs.1GPU only whendevice="cuda"ordevice="auto"resolves to CUDA.
Returns¶
run_tune_four_param returns Ray Tune's ResultGrid.
Common operations include:
best = results.get_best_result(metric="val_loss", mode="min")
print(best.config)
print(best.metrics)
print(best.checkpoint)
The tuning objective is validation loss, minimized with metric="val_loss" and mode="min".
Typical Usage With A Custom Search Space¶
from ray import tune
from reionemu import run_tune_four_param
param_space = {
"hidden_dim": tune.choice([32, 64, 128]),
"num_hidden_layers": tune.choice([2, 3]),
"activation": tune.choice(["relu", "gelu", "silu"]),
"optimizer": tune.choice(["adamw"]),
"lr": tune.loguniform(3e-4, 3e-3),
"weight_decay": tune.loguniform(1e-8, 1e-4),
"batch_size": tune.choice([32, 64]),
"epochs": 150,
"early_stopping_patience": 15,
"gradient_clipping": tune.choice([None, 1.0]),
"normalize_X": True,
"normalize_Y": False,
}
results = run_tune_four_param(
X_train=X_train,
Y_train=Y_train,
X_val=X_val,
Y_val=Y_val,
param_space=param_space,
num_samples=20,
max_concurrent_trials=2,
device="cpu",
storage_path="ray_results",
experiment_name="four_param_search",
)
best = results.get_best_result(metric="val_loss", mode="min")
print(best.config)
Using The Best Result¶
After tuning, use the best config to train a final model with the standard training API or inspect the checkpoint saved by Ray Tune.
Train A Final Model From The Best Config¶
import torch
from reionemu import FitConfig, build_four_param_model, build_optimizer, fit
best = results.get_best_result(metric="val_loss", mode="min")
best_config = best.config
model = build_four_param_model(best_config)
optimizer = build_optimizer(model, best_config)
history = fit(
model,
loaders["train"],
loaders["val"],
optimizer,
torch.nn.MSELoss(),
config=FitConfig(
epochs=best_config["epochs"],
device="cpu",
early_stopping_patience=best_config.get("early_stopping_patience"),
gradient_clipping=best_config.get("gradient_clipping"),
),
)
Inspect A Best Checkpoint¶
import torch
best = results.get_best_result(metric="val_loss", mode="min")
with best.checkpoint.as_directory() as checkpoint_dir:
metadata = torch.load(f"{checkpoint_dir}/metadata.pt")
print(metadata["best_val_loss"])
print(metadata["best_epoch"])
Reproducibility Notes¶
The tuning helpers use train/validation arrays exactly as passed to the function. If you want a reproducible split, create that split with a fixed seed before calling run_tune_four_param(...).
The current default search space does not set a PyTorch random seed inside each trial. Ray Tune will track the sampled hyperparameters and metrics, but exact training curves can still vary across runs because of model initialization, dataloader shuffling, and hardware-level nondeterminism.
Common Issues¶
- Ray stores outputs in an unexpected place: Pass
storage_path="ray_results"or another explicit directory. - Trials are too slow: Reduce
num_samples, reduceepochs, or lowermax_concurrent_trials. - GPU is not used: Pass
device="cuda"and confirm CUDA is available in the active PyTorch environment. param_space["epochs"]is invalid:epochsmust be at least1.- Tuning feels unnecessary: Use
FourParamEmulatorwithfit(...)for a quick deterministic baseline before launching Ray Tune, or useMCDropoutEmulatorwithevaluate_mc_metricswhen dropout-based predictive spread is the goal.