First Pipeline

Note

It is highly recommended that you have completed the ml4gw quickstart instructions, or installed the equivalent software, before running the sandbox pipeline.

Note

It is assumed that you have already built each project’s container (See projects)

Aframe pipelines strings together luigi / law tasks to run an end-to-end workflow. Here, we will run the Sandbox pipeline, (see also the tuning pipeline). In short, the Sandbox pipeline will

  1. Generate training data

  2. Generate testing data

  3. Train or Tune a model

  4. Export trained weights to TensorRT

  5. Perform inference using Triton

  6. Calculate sensitive volume

Configuration

The Sandbox pipeline is configured by two main configuration files. A .cfg file is used by law, and contains the parameters for the data generation, export, and inference tasks. See here for a complete example.

Training configuration and parsing is handled by PyTorch Lightning, which uses a .yaml file. See here for a complete example

Note

When running pipelines, parameters that are common between the training task and other tasks (e.g. ifos, highpass, fduration) are specified once in the .cfg and automatically passed to the downstream training or tuning config.yaml by luigi/law.

Initialize a Pipeline

The aframe-init command line tool can be used to initialize a directory with configuration files for a fresh run. In the specified directory, aframe-init will create default .cfg and .yaml configuration files, as well as a run.sh file for launching the pipeline.

Tip

When running a new “experiment”, it is recommended to use aframe-init to initialize a new directory. This way, all the configuration associated with the experiment is isolated, and the experiment is reproducible.

While in the root directory, a sandbox pipeline can be initialized with

poetry run aframe-init offline  --mode sandbox --directory ~/aframe/my-first-run/ 

You can also initialize a directory for launching the tune pipeline

poetry run aframe-init offline --mode tune --directory ~/aframe/my-first-tune-run/ 

Now, you can navigate to the experiment directory and edit the configuration files as you wish.

Running the Pipeline

Note

Running the sandbox pipeline out-of-the-box requires access to an enterprise-grade GPU(s) (e.g. P100, V100, T4, A[30,40,100], etc.). There are several nodes on the LIGO Data Grid which meet these requirements_**.

In the experiment directory a run.sh file will be created that looks like

#!/bin/bash
# Export environment variables
export AFRAME_TRAIN_DATA_DIR=/home/albert.einstein/aframe/my-first-run/data/train
export AFRAME_TEST_DATA_DIR=/home/albert.einstein/aframe/my-first-run/data/test
export AFRAME_TRAIN_RUN_DIR=/home/albert.einstein/aframe/my-first-run/training
export AFRAME_CONDOR_DIR=/home/albert.einstein/aframe/my-first-run/condor
export AFRAME_RESULTS_DIR=/home/albert.einsteinaframe/my-first-run/results
export AFRAME_TMPDIR=/home/albert.einsteinaframe/my-first-run/tmp/

# launch pipeline; modify the gpus, workers etc. to suit your needs
# note that if you've made local code changes not in the containers
# you'll need to add the --dev flag!
LAW_CONFIG_FILE=/home/albert.einstein/aframe/my-first-run/sandbox.cfg poetry run --directory /home/albert.einstein/projects/aframev2 law run aframe.pipelines.sandbox.Sandbox --workers 5 --gpus 0

Environment variables are automatically set based on the specified experiment directory. These environment variables are ingested by the law tasks and control where various pipeline artifacts are stored.

  • AFRAME_TRAIN_DATA_DIR Training data storage

  • AFRAME_TEST_DATA_DIR Testing data storage

  • AFRAME_TRAIN_RUN_DIR Training artifact storage

  • AFRAME_CONDOR_DIR Condor submit files and logs

  • AFRAME_RESULTS_DIR Inference and sensitive volume results

  • AFRAME_TMPDIR Intermediate data product storage

The last line of the run.sh contains the command that launches the pipeline. The workers argument specifies how many luigi workers to use. This controls how many concurrent tasks can be launched. It is useful to specify more than 1 worker when you have several tasks that are not dependent on one another. The default of 5 should be plenty.

The gpus argument controls which gpus to use for training and inference. Under the hood, the pipeline is simply setting the CUDA_VISIBLE_DEVICES environment variable. gpus should be specified as a comma separated list (e.g. --gpus 0,1,2).

The pipeline can now be kicked off by executing the run.sh

bash ~/aframe/my-first-run/run.sh

Tip

The end to end pipeline can take a few days to run. If you wish to launch an analysis with the freedom of ending your ssh session, use a tool like tmux or screen

The most time consuming steps are training, and performing inference. If you wish to reduce these timescales for testing the end-to-end analysis, consider altering the following arguments:

  • Number of training epochs, max_epochs, in the training yaml configuration file

  • Batches analyzed each epoch, batches_per_epoch, in the training yaml configuration file

  • Seconds of analyzed background livetime Tb, in the .cfg file

  • Number of injections performed, num_injections, in the .cfg file