Data
Scripts for producing training and testing data for Aframe
Environment
The data project environment utilizes Mamba and poetry. Mamba is needed for installing
the LIGO frame reading libraries python-ldas-tools-framecpp and https://anaconda.org/conda-forge/python-nds2-client, which are unavailable on PyPi.
In the root of the data project, run
apptainer build $AFRAME_CONTAINER_ROOT/data.sif apptainer.def
to build the data container.
The container will first build an environment using the conda-lock.yml, and then install local dependencies defined in the pyproject.toml.
If the dependencies in the environment.yaml require modifications, the conda-lock.yml will need to be updated
conda-lock -f environment.yaml -p linux-64
and the container image will need to be rebuilt.
Scripts
The data project consists of four main sub-modules:
data/segments- Querying science mode segmentsdata/fetch- Fetching strain datadata/timeslide_waveforms- Generating waveforms for injection campaignsdata/waveforms- Generating waveforms for training Aframe
Additionally, the main executable of each sub-module is exposed via a CLI at data/cli.py
Example: generating training data
As an example, let’s build a training dataset using the CLI in the data container we built above
First, let’s make a data storage directory, and query science mode segments from gwosc
mkdir -p ~/aframe/data/train/background
apptainer run $AFRAME_CONTAINER_ROOT/data.sif \
python -m data query --flags='["H1_DATA", "L1_DATA"]' --start 1240579783 --end 1241443783 --output_file ~/aframe/data/segments.txt
Inspecting the output, (vi ~/aframe/data/segments.txt) it looks like there are science mode data segments between (1240579783, 1240587612) and (1240594562, 1240606748).
Next, let’s fetch strain data during those segments. One will be used for training, the other for validating
apptainer run $AFRAME_CONTAINER_ROOT/data.sif \
python -m data fetch \
--start 1240579783 \
--end 1240587612 \
--channels='["H1", "L1"]' \
--sample_rate 2048 \
--output_directory ~/aframe/data/train/background/
apptainer run $AFRAME_CONTAINER_ROOT/data.sif \
python -m data fetch \
--start 1240594562 \
--end 1240606748 \
--channels='["H1", "L1"]' \
--sample_rate 2048 \
--output_directory ~/aframe/data/train/background/
Finally, lets generate some waveforms for training
apptainer run $AFRAME_CONTAINER_ROOT/data.sif \
python -m data training_waveforms \
--num_signals 10000 \
--waveform_duration 8 \
--sample_rate 2048 \
--prior priors.priors.end_o3_ratesandpops \
--minimum_frequency 20 \
--reference_frequency 50 \
--waveform_approximant IMRPhenomXPHM \
--coalescence_time 6 \
--output_file ~/aframe/data/train/train_waveforms.hdf5
and validation. Note that this uses one of the background files downloaded above.
apptainer run $AFRAME_CONTAINER_ROOT/data.sif \
python -m data validation_waveforms \
--num_signals 2000 \
--prior priors.priors.end_o3_ratesandpops \
--ifos='["H1", "L1"]' \
--minimum_frequency 20 \
--reference_frequency 50 \
--sample_rate 2048 \
--waveform_duration 8 \
--waveform_approximant IMRPhenomXPHM \
--coalescence_time 6 \
--highpass 32 \
--snr_threshold 4 \
--psd ~/aframe/data/train/background/background-1240579783-7829.hdf5
--output_file ~/aframe/data/train/val_waveforms.hdf5
Note that the train project assumes these waveform files are named as above! To continue this example, see the training Aframe example