Getting Started¶

Use this page as the quickest path to a first successful run.

Installation¶

reionemu can be installed via pip with:

pip install reionemu

or from source (editable):

git clone https://github.com/RobertxPearce/reionization-emulator.git
cd reionization-emulator
python -m pip install -e .

Requires Python 3.10+. Package dependencies include NumPy, h5py, PyTorch, and Ray Tune.

To run the test suite:

python -m pip install -e ".[test]"
pytest tests/ -v

Verify installation¶

To confirm the package imports correctly:

import reionemu as remu

print(remu.__all__)

This should print the main public functions, config objects, and model classes exposed by the package.

First run¶

The example below starts from a condensed HDF5 file that already contains a /training group with X, Y, and ell. If you are starting from raw simulation outputs instead, begin with the Simulation I/O workflow first.

After installing, you can load a prepared training dataset, create dataloaders, and train the baseline deterministic 4-parameter emulator:

from pathlib import Path
import torch
import reionemu

# Path to a condensed HDF5 that already has /training (X, Y, ell)
h5_path = Path("path/to/condensed.h5")

# Dataloaders with train/val split and optional normalization
loaders, normalizers, ell = reionemu.make_dataloaders(
    h5_path,
    split={"train": 0.8, "val": 0.2},
    config=reionemu.DataLoaderConfig(batch_size=32, seed=42),
)

# Baseline 4-parameter model, optimizer, loss
model = reionemu.FourParamEmulator()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = torch.nn.MSELoss()

# Train for a few epochs
history = reionemu.fit(
    model,
    loaders["train"],
    loaders["val"],
    optimizer,
    loss_fn,
    config=reionemu.FitConfig(epochs=10, device="cpu"),
)

# Validation loss per epoch
print(history["val_loss"])

artifact_dir = reionemu.save_artifact(
    "baseline_four_param",
    Path("artifacts"),
    dataset_path=h5_path,
    dataloader_config=reionemu.DataLoaderConfig(batch_size=32, seed=42),
    fit_config=reionemu.FitConfig(epochs=10, device="cpu"),
    model_config={"class_name": "FourParamEmulator"},
    optimizer_config={"name": "Adam", "lr": 1e-3},
    history=history,
    normalizers=normalizers,
    checkpoint=model.state_dict(),
)

print(artifact_dir)

If this runs successfully, you should see a validation-loss history printed at the end of the script and an artifacts/baseline_four_param/ directory with info.json, configs.json, results.json, and optional binary sidecars.

If you want a dropout-based emulator that can be evaluated with Monte Carlo dropout, swap in MCDropoutEmulator and use the MC evaluation path:

model = reionemu.MCDropoutEmulator(dropout_rate=0.2)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

history = reionemu.fit(
    model,
    loaders["train"],
    loaders["val"],
    optimizer,
    torch.nn.MSELoss(),
    config=reionemu.FitConfig(epochs=10, device="cpu"),
    evaluation="evaluate_mc_metrics",
    n_mc_samples=50,
)

print(history["val_mean_predictive_std"])

Common pitfalls¶

Make sure the input HDF5 file already contains a /training group before calling make_dataloaders.
Use the Simulation I/O pipeline first if you only have raw simulation outputs.
Confirm that your Python environment includes the package dependencies before running training code.
Save normalizers and model weights as artifact sidecars rather than putting them directly into JSON.