Getting Started¶
Use this page as the quickest path to a first successful run.
Installation¶
reionemu can be installed via pip with:
pip install reionemu
or from source (editable):
git clone https://github.com/RobertxPearce/reionization-emulator.git
cd reionization-emulator
python -m pip install -e .
Requires Python 3.10+. Package dependencies include NumPy, h5py, PyTorch, and Ray Tune.
To run the test suite:
python -m pip install -e ".[test]"
pytest tests/ -v
Verify installation¶
To confirm the package imports correctly:
import reionemu as remu
print(remu.__all__)
This should print the main public functions, config objects, and model classes exposed by the package.
First run¶
The example below starts from a condensed HDF5 file that already contains a /training group with X, Y, and ell. If you are starting from raw simulation outputs instead, begin with the Simulation I/O workflow first.
After installing, you can load a prepared training dataset, create dataloaders, and train the baseline deterministic 4-parameter emulator:
from pathlib import Path
import torch
import reionemu
# Path to a condensed HDF5 that already has /training (X, Y, ell)
h5_path = Path("path/to/condensed.h5")
# Dataloaders with train/val split and optional normalization
loaders, normalizers, ell = reionemu.make_dataloaders(
h5_path,
split={"train": 0.8, "val": 0.2},
config=reionemu.DataLoaderConfig(batch_size=32, seed=42),
)
# Baseline 4-parameter model, optimizer, loss
model = reionemu.FourParamEmulator()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = torch.nn.MSELoss()
# Train for a few epochs
history = reionemu.fit(
model,
loaders["train"],
loaders["val"],
optimizer,
loss_fn,
config=reionemu.FitConfig(epochs=10, device="cpu"),
)
# Validation loss per epoch
print(history["val_loss"])
artifact_dir = reionemu.save_artifact(
"baseline_four_param",
Path("artifacts"),
dataset_path=h5_path,
dataloader_config=reionemu.DataLoaderConfig(batch_size=32, seed=42),
fit_config=reionemu.FitConfig(epochs=10, device="cpu"),
model_config={"class_name": "FourParamEmulator"},
optimizer_config={"name": "Adam", "lr": 1e-3},
history=history,
normalizers=normalizers,
checkpoint=model.state_dict(),
)
print(artifact_dir)
If this runs successfully, you should see a validation-loss history printed at the end of the script and an artifacts/baseline_four_param/ directory with info.json, configs.json, results.json, and optional binary sidecars.
If you want a dropout-based emulator that can be evaluated with Monte Carlo dropout, swap in MCDropoutEmulator and use the MC evaluation path:
model = reionemu.MCDropoutEmulator(dropout_rate=0.2)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
history = reionemu.fit(
model,
loaders["train"],
loaders["val"],
optimizer,
torch.nn.MSELoss(),
config=reionemu.FitConfig(epochs=10, device="cpu"),
evaluation="evaluate_mc_metrics",
n_mc_samples=50,
)
print(history["val_mean_predictive_std"])
Common pitfalls¶
- Make sure the input HDF5 file already contains a
/traininggroup before callingmake_dataloaders. - Use the Simulation I/O pipeline first if you only have raw simulation outputs.
- Confirm that your Python environment includes the package dependencies before running training code.
- Save normalizers and model weights as artifact sidecars rather than putting them directly into JSON.