# Running Custom Workflows Using Samples

## Running Custom Workflows using samples

Workbench allows you to run custom workflows using samples from your connected instrument data. This will enable you to launch any workflow on your targeted samples with a single click from the UI.

### Prerequisites

To launch custom workflows using Samples there are a few prerequisites that the specific workflow must meet:

1. The workflow must be sample-enabled. This means that the workflow must be designed to accept samples as input.
2. All inputs can either be supplied by a set of defaults OR non interactively by a workflow transformation

### Create a custom workflow

Use either the [UI](https://docs.omics.ai/products/workbench/getting-started/finding-or-importing-a-workflow) or the [CLI](https://docs.omics.ai/products/command-line-interface/reference/workbench/workflows-create) to create a custom workflow with a meaningful name.

### Configure Defaults

If the workflow requires inputs that are not dynamic, you can specify the default values to use when launching the workflow. You can have multiple different defaults for the same workflow so long as they use different selectors which are evaluated at runtime. The following selectors are supported:

* `engine_id`
* `cloud`
* `region`

The most specific selector is used when resolving the default value for a workflow. For example if you have two sets of defaults for your workflow with the following selectors:

1. `engine_id` = `my-engine` (also on GCP)
2. `cloud` = `GCP`

If you launch the workflow against the engine `my-engine` and the cloud `GCP`, the default value for the `engine_id` input will be `my-engine`.

Defaults are JSON values, and you can use the [CLI](https://docs.omics.ai/products/command-line-interface/reference/workbench/workflows-versions-defaults-create) to set them.

```bash
omics workbench workflows versions defaults create \
  --workflow <YOUR_WORKFLOW_ID> \
  --version <YOUR_WORKFLOW_VERSION> \
  --engine my-engine \
  --values @defaults.json \
  my-defaults-id
```

### Enable support for Samples

To enable support for Samples, you need to define a transformation that maps the sample metadata to the workflow inputs. Transformations are defined using the [CLI](https://docs.omics.ai/products/command-line-interface/reference/workbench/worfklows-versions-transformations-create) and are a JavaScript function that takes the execution request as input and returns a modified execution request as output. Workbench will merge the result of applying the transformation with the original execution request to generate the final execution request.

Only changes to the following fields are allowed in the transformation (all other fields are ignored):

* `workflow_params`
* `tags`
* `workflow_engine_parameters`

The following is an example execution request that contains sample metadata:

```json
{
  "samples": [
    {
      "id": "sample-1",
      "family_id": "family-1",
      "father_id": "father",
      "mother_id": "mother",
      "sex": "MALE",
      "phenotypes": [
        {
          "type": {
            "id": "HP:0000001"
          }
        }
      ],
      "files": [
        {
          "path": "gs://my-bucket/sample-1_R1.fastq.gz",
          "size": 123122
        },
        {
          "path": "gs://my-bucket/sample-1_R2.fastq.gz",
          "size": 123122
        }
      ]
    },
    {
      "id": "father",
      "family_id": "family-1",
      "sex": "MALE",
      "phenotypes": [],
      "files": [
        {
          "path": "gs://my-bucket/father-R1.fastq.gz",
          "size": 123122
        },
        {
          "path": "gs://my-bucket/father_R2.fastq.gz",
          "size": 123122
        }
      ]
    },
    {
      "id": "mother",
      "family_id": "family-1",
      "sex": "FEMALE",
      "phenotypes": [],
      "files": [
        {
          "path": "gs://my-bucket/mother-R1.fastq.gz",
          "size": 123122
        },
        {
          "path": "gs://my-bucket/mother_R2.fastq.gz",
          "size": 123122
        }
      ]
    }
  ],
  "workflow_params": {
  },
  "tags": {
  },
  "workflow_engine_parameters": {
  }
}
```

#### Example Transformation

Using the above execution request as input, the following transformation will map the sample files to the workflow inputs:

```javascript

(context) => {
    return {
        workflow_params: {
            samples: context.samples?.map(sample => {
                return {
                    sample_id: sample.id,
                    bam_files: sample.files.map(sampleFile => sampleFile.path),
                    father_id: sample.father_id,
                    mother_id: sample.mother_id,
                    sex: sample.sex,
                    affected: sample.affected ?? false
                }
            }),
        }
    }
}

```

#### Registering the Transformation

You can use the CLI to register the transformation. In order for the workflow to show up in the UI, you need to specify the appropriate labels when creating the transformation. The following labels are required:

* `samples`
* one of: `secondary` or `tertiary`

```bash
omics workbench workflows versions transformations create \
  --workflow <YOUR_WORKFLOW_ID> \
  --version <YOUR_WORKFLOW_VERSION> \
  --label samples --label secondary \
  $(cat transformation.js)
```

### Launching a custom workflow

At this point, you should be able to see the workflow in the UI when viewing your samples and you can follow the same steps as described in [Running a workflow using samples](https://docs.omics.ai/products/workbench/instruments/running-workflows-using-samples) to launch the workflow.

## Running a custom Tertiary Workflow

Workbench also simplifies the process of running a custom tertiary workflow, chaining and submitting all of its dependencies automatically. Many same steps as described above apply, but you will need to specify the `tertiary` label when creating the transformation in addition to specifying a [workflow dependency](https://docs.omics.ai/products/command-line-interface/reference/workbench/workflows-versions-dependencies-create) in order to facilitate the chaining of the workflow.

### Create your Secondary Workflow

Follow the same steps as described above to create your secondary workflow. The upstream workflows know nothing about the downstream tertiary workflow, so you do not need to modify it.

### Create your Tertiary Workflow

Use either the [UI](https://docs.omics.ai/products/workbench/getting-started/finding-or-importing-a-workflow) or the [CLI](https://docs.omics.ai/products/command-line-interface/reference/workbench/workflows-create) to create a custom workflow with a meaningful name. You can choose to also create defaults for the tertiary workflow if you would like to prepopulate any non-dynamic inputs.

### Register the Dependencies

Dependencies allow you to chain workflows together, specifying which workflows are prerequisites for the current one. A single workflow can have multiple dependencies, all of which must be satisfied before the current workflow can be launched through instruments.

Creating dependencies is done using the [CLI](https://docs.omics.ai/products/command-line-interface/reference/workbench/workflows-versions-dependencies-create) and simply requires specifying the `workflow_id` and `workflow_version` of the workflow that you would like to chain.

```bash
omics workbench workflows versions dependencies create \
  --workflow <YOUR_TERTIARY_WORKFLOW_ID> \
  --version <YOUR_TERTIARY_WORKFLOW_VERSION> \
  --name "Secondary Workflows" \
  --dependency "<YOUR_SECONDARY_WORKFLOW_ID>/<YOUR_SECONDARY_WORKFLOW_VERSION>" # You can specify multiple dependencies by repeating this flag 
```

### Create your Transformation

Follow the same steps as described above to create your transformation, but be sure to specify the `tertiary` label when creating the transformation in addition to the `samples` label. Sample metadata will be provided when launching through instruments in the same way as described above, however, the outputs of all dependencies will be added to the execution request. The first workflow matching the workflow\_version and workflow\_id of a dependency that has the status of `COMPLETE` will be used.

```json
{
  "dependencies": [
    {
      "run_id": "51A5B9C1-9425-4F94-8C5E-5CCC0EC9C6A2",
      "workflow_id": "secondary_workflow_id",
      "workflow_version": "secondary_workflow_version",
      "outputs": {
        "vcf_file": "gs://my-bucket/vcf_file.vcf.gz",
        "vcf_index": "gs://my-bucket/vcf_file.vcf.gz.tbi"
      }
    },
    {
      "run_id": "CDDDAC0D-245E-47F7-8A8B-797E879B765",
      "workflow_id": "stats_workflow_id",
      "workflow_version": "stats_workflow_version",
      "outputs": {
        "stats_file": "gs://my-bucket/stats_file.txt"
      }
    }
  ],
  "samples": []
}
```

You can then map the outputs of the dependencies to the inputs of the tertiary workflow. You can use javascript functions to identify the appropriate output for each input.

```javascript
(context) => {
    return {
        workflow_params: {
            vcf_files: context.dependencies?.find(dependency => dependency.workflow_id === 'secondary_workflow_id')?.outputs.vcf_file,
            stats_files: context.dependencies?.find(dependency => dependency.workflow_id === 'stats_workflow_id')?.outputs.stats_file,
        }
    }
}
```

#### Registering the Transformation

You can use the CLI to register the transformation. In order for the workflow to show up in the UI, you need to specify the appropriate labels when creating the transformation. The following labels are required:

* `samples`
* `tertiary`

```bash
omics workbench workflows versions transformations create \
  --workflow <YOUR_WORKFLOW_ID> \
  --version <YOUR_WORKFLOW_VERSION> \
  --label samples --label tertiary \
  $(cat tertiary-transformation.js)
```

#### Note on Launching Tertiary Workflows

When you launch a tertiary workflow, the UI will automatically launch all of its dependencies. If any of the dependencies fail, the tertiary workflow will not be launched and will also be marked as failed. Tertiary workflows will not be resubmitted if they are already in a `COMPLETE` state.

When launched, Workbench will wait for all dependencies to complete before launching the tertiary workflow, and it will be kept in the `QUEUED` state until all dependencies are complete.
