First Pipeline
==============
```{eval-rst}
.. note::
It is highly recommended that you have completed the `ml4gw quickstart `_ instructions, or installed the equivalent software, before running the sandbox pipeline.
```
```{eval-rst}
.. note::
It is assumed that you have already built each project's container (See :doc:`projects `)
```
Aframe pipelines strings together `luigi` / `law` tasks to run an end-to-end workflow. Here, we will run the `Sandbox` pipeline, (see also the {doc}`tuning ` pipeline).
In short, the `Sandbox` pipeline will
1. Generate training data
2. Generate testing data
3. Train or Tune a model
4. Export trained weights to TensorRT
5. Perform inference using Triton
6. Calculate sensitive volume
## Configuration
The `Sandbox` pipeline is configured by two main configuration files. A `.cfg` file is used by `law`, and contains the parameters
for the data generation, export, and inference tasks. See [here](https://github.com/ML4GW/aframe/blob/main/aframe/pipelines/sandbox/configs/bbh.cfg) for a complete example.
Training configuration and parsing is handled by [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/), which
uses a `.yaml` file. See [here](https://github.com/ML4GW/aframe/blob/main/projects/train/configs/bbh.yaml) for a complete example
```{eval-rst}
.. note::
When running pipelines, parameters that are common between the training task and other tasks (e.g. :code:`ifos`, :code:`highpass`, :code:`fduration`) are specified once in the :code:`.cfg` and automatically passed to the downstream training or tuning :code:`config.yaml` by :code:`luigi`/:code:`law`.
```
## Initialize a Pipeline
The `aframe-init` command line tool can be used to initialize a directory with configuration files for a fresh run.
In the specified directory, `aframe-init` will create default `.cfg` and `.yaml` configuration files, as well as a `run.sh` file for launching the pipeline.
```{eval-rst}
.. tip::
When running a new "experiment", it is recommended to use :code:`aframe-init` to initialize a new directory. This way, all the configuration associated with the experiment is isolated, and the experiment is reproducible.
```
While in the root directory, a sandbox pipeline can be initialized with
```console
poetry run aframe-init offline --mode sandbox --directory ~/aframe/my-first-run/
```
You can also initialize a directory for launching the tune pipeline
```console
poetry run aframe-init offline --mode tune --directory ~/aframe/my-first-tune-run/
```
Now, you can navigate to the experiment directory and edit the configuration files as you wish.
## Running the Pipeline
```{eval-rst}
.. note::
Running the sandbox pipeline out-of-the-box requires access to an enterprise-grade GPU(s) (e.g. P100, V100, T4, A[30,40,100], etc.). There are several nodes on the LIGO Data Grid which meet these requirements_**.
```
In the experiment directory a `run.sh` file will be created that looks like
```bash
#!/bin/bash
# Export environment variables
export AFRAME_TRAIN_DATA_DIR=/home/albert.einstein/aframe/my-first-run/data/train
export AFRAME_TEST_DATA_DIR=/home/albert.einstein/aframe/my-first-run/data/test
export AFRAME_TRAIN_RUN_DIR=/home/albert.einstein/aframe/my-first-run/training
export AFRAME_CONDOR_DIR=/home/albert.einstein/aframe/my-first-run/condor
export AFRAME_RESULTS_DIR=/home/albert.einsteinaframe/my-first-run/results
export AFRAME_TMPDIR=/home/albert.einsteinaframe/my-first-run/tmp/
# launch pipeline; modify the gpus, workers etc. to suit your needs
# note that if you've made local code changes not in the containers
# you'll need to add the --dev flag!
LAW_CONFIG_FILE=/home/albert.einstein/aframe/my-first-run/sandbox.cfg poetry run --directory /home/albert.einstein/projects/aframev2 law run aframe.pipelines.sandbox.Sandbox --workers 5 --gpus 0
```
Environment variables are automatically set based on the specified experiment directory. These environment variables
are ingested by the `law` tasks and control where various pipeline artifacts are stored.
- `AFRAME_TRAIN_DATA_DIR` Training data storage
- `AFRAME_TEST_DATA_DIR` Testing data storage
- `AFRAME_TRAIN_RUN_DIR` Training artifact storage
- `AFRAME_CONDOR_DIR` Condor submit files and logs
- `AFRAME_RESULTS_DIR` Inference and sensitive volume results
- `AFRAME_TMPDIR` Intermediate data product storage
The last line of the `run.sh` contains the command that launches the pipeline. The `workers` argument specifies how many `luigi` workers to use. This controls how many concurrent tasks can be launched. It is useful to specify more than 1 worker when you have several tasks that are not dependent on one another. The default of 5 should be plenty.
The `gpus` argument controls which gpus to use for training and inference. Under the hood, the pipeline is simply setting
the `CUDA_VISIBLE_DEVICES` environment variable. `gpus` should be specified as a comma separated list (e.g. `--gpus 0,1,2`).
The pipeline can now be kicked off by executing the `run.sh`
```console
bash ~/aframe/my-first-run/run.sh
```
```{eval-rst}
.. tip::
The end to end pipeline can take a few days to run.
If you wish to launch an analysis with the freedom of ending
your ssh session, use a tool like `tmux `_ or `screen `_
```
The most time consuming steps are training, and performing inference. If you wish to reduce these timescales for testing the end-to-end analysis, consider altering the following arguments:
- Number of training epochs, `max_epochs`, in the training `yaml` configuration file
- Batches analyzed each epoch, `batches_per_epoch`, in the training `yaml` configuration file
- Seconds of analyzed background livetime `Tb`, in the `.cfg` file
- Number of injections performed, `num_injections`, in the `.cfg` file