Running Custom Workflows Using Samples

Running Custom Workflows using samples

Workbench allows you to run custom workflows using samples from your connected instrument data. This will enable you to launch any workflow on your targeted samples with a single click from the UI.

Prerequisites

To launch custom workflows using Samples there are a few prerequisites that the specific workflow must meet:

The workflow must be sample-enabled. This means that the workflow must be designed to accept samples as input.
All inputs can either be supplied by a set of defaults OR non interactively by a workflow transformation

Create a custom workflow

Use either the UI or the CLI to create a custom workflow with a meaningful name.

Configure Defaults

If the workflow requires inputs that are not dynamic, you can specify the default values to use when launching the workflow. You can have multiple different defaults for the same workflow so long as they use different selectors which are evaluated at runtime. The following selectors are supported:

engine_id
cloud
region

The most specific selector is used when resolving the default value for a workflow. For example if you have two sets of defaults for your workflow with the following selectors:

engine_id = my-engine (also on GCP)
cloud = GCP

If you launch the workflow against the engine my-engine and the cloud GCP, the default value for the engine_id input will be my-engine.

Defaults are JSON values, and you can use the CLI to set them.

omics workbench workflows versions defaults create \
  --workflow <YOUR_WORKFLOW_ID> \
  --version <YOUR_WORKFLOW_VERSION> \
  --engine my-engine \
  --values @defaults.json \
  my-defaults-id

Enable support for Samples

To enable support for Samples, you need to define a transformation that maps the sample metadata to the workflow inputs. Transformations are defined using the CLI and are a JavaScript function that takes the execution request as input and returns a modified execution request as output. Workbench will merge the result of applying the transformation with the original execution request to generate the final execution request.

Only changes to the following fields are allowed in the transformation (all other fields are ignored):

workflow_params
tags
workflow_engine_parameters

The following is an example execution request that contains sample metadata:

{
  "samples": [
    {
      "id": "sample-1",
      "family_id": "family-1",
      "father_id": "father",
      "mother_id": "mother",
      "sex": "MALE",
      "phenotypes": [
        {
          "type": {
            "id": "HP:0000001"
          }
        }
      ],
      "files": [
        {
          "path": "gs://my-bucket/sample-1_R1.fastq.gz",
          "size": 123122
        },
        {
          "path": "gs://my-bucket/sample-1_R2.fastq.gz",
          "size": 123122
        }
      ]
    },
    {
      "id": "father",
      "family_id": "family-1",
      "sex": "MALE",
      "phenotypes": [],
      "files": [
        {
          "path": "gs://my-bucket/father-R1.fastq.gz",
          "size": 123122
        },
        {
          "path": "gs://my-bucket/father_R2.fastq.gz",
          "size": 123122
        }
      ]
    },
    {
      "id": "mother",
      "family_id": "family-1",
      "sex": "FEMALE",
      "phenotypes": [],
      "files": [
        {
          "path": "gs://my-bucket/mother-R1.fastq.gz",
          "size": 123122
        },
        {
          "path": "gs://my-bucket/mother_R2.fastq.gz",
          "size": 123122
        }
      ]
    }
  ],
  "workflow_params": {
  },
  "tags": {
  },
  "workflow_engine_parameters": {
  }
}

Example Transformation

Using the above execution request as input, the following transformation will map the sample files to the workflow inputs:


(context) => {
    return {
        workflow_params: {
            samples: context.samples?.map(sample => {
                return {
                    sample_id: sample.id,
                    bam_files: sample.files.map(sampleFile => sampleFile.path),
                    father_id: sample.father_id,
                    mother_id: sample.mother_id,
                    sex: sample.sex,
                    affected: sample.affected ?? false
                }
            }),
        }
    }
}

Registering the Transformation

You can use the CLI to register the transformation. In order for the workflow to show up in the UI, you need to specify the appropriate labels when creating the transformation. The following labels are required:

samples
one of: secondary or tertiary

omics workbench workflows versions transformations create \
  --workflow <YOUR_WORKFLOW_ID> \
  --version <YOUR_WORKFLOW_VERSION> \
  --label samples --label secondary \
  $(cat transformation.js)

Launching a custom workflow

At this point, you should be able to see the workflow in the UI when viewing your samples and you can follow the same steps as described in Running a workflow using samples to launch the workflow.

Running a custom Tertiary Workflow

Workbench also simplifies the process of running a custom tertiary workflow, chaining and submitting all of its dependencies automatically. Many same steps as described above apply, but you will need to specify the tertiary label when creating the transformation in addition to specifying a workflow dependency in order to facilitate the chaining of the workflow.

Create your Secondary Workflow

Follow the same steps as described above to create your secondary workflow. The upstream workflows know nothing about the downstream tertiary workflow, so you do not need to modify it.

Create your Tertiary Workflow

Use either the UI or the CLI to create a custom workflow with a meaningful name. You can choose to also create defaults for the tertiary workflow if you would like to prepopulate any non-dynamic inputs.

Register the Dependencies

Dependencies allow you to chain workflows together, specifying which workflows are prerequisites for the current one. A single workflow can have multiple dependencies, all of which must be satisfied before the current workflow can be launched through instruments.

Creating dependencies is done using the CLI and simply requires specifying the workflow_id and workflow_version of the workflow that you would like to chain.

omics workbench workflows versions dependencies create \
  --workflow <YOUR_TERTIARY_WORKFLOW_ID> \
  --version <YOUR_TERTIARY_WORKFLOW_VERSION> \
  --name "Secondary Workflows" \
  --dependency "<YOUR_SECONDARY_WORKFLOW_ID>/<YOUR_SECONDARY_WORKFLOW_VERSION>" # You can specify multiple dependencies by repeating this flag

Create your Transformation

Follow the same steps as described above to create your transformation, but be sure to specify the tertiary label when creating the transformation in addition to the samples label. Sample metadata will be provided when launching through instruments in the same way as described above, however, the outputs of all dependencies will be added to the execution request. The first workflow matching the workflow_version and workflow_id of a dependency that has the status of COMPLETE will be used.

{
  "dependencies": [
    {
      "run_id": "51A5B9C1-9425-4F94-8C5E-5CCC0EC9C6A2",
      "workflow_id": "secondary_workflow_id",
      "workflow_version": "secondary_workflow_version",
      "outputs": {
        "vcf_file": "gs://my-bucket/vcf_file.vcf.gz",
        "vcf_index": "gs://my-bucket/vcf_file.vcf.gz.tbi"
      }
    },
    {
      "run_id": "CDDDAC0D-245E-47F7-8A8B-797E879B765",
      "workflow_id": "stats_workflow_id",
      "workflow_version": "stats_workflow_version",
      "outputs": {
        "stats_file": "gs://my-bucket/stats_file.txt"
      }
    }
  ],
  "samples": []
}

You can then map the outputs of the dependencies to the inputs of the tertiary workflow. You can use javascript functions to identify the appropriate output for each input.

(context) => {
    return {
        workflow_params: {
            vcf_files: context.dependencies?.find(dependency => dependency.workflow_id === 'secondary_workflow_id')?.outputs.vcf_file,
            stats_files: context.dependencies?.find(dependency => dependency.workflow_id === 'stats_workflow_id')?.outputs.stats_file,
        }
    }
}

Registering the Transformation

samples
tertiary

omics workbench workflows versions transformations create \
  --workflow <YOUR_WORKFLOW_ID> \
  --version <YOUR_WORKFLOW_VERSION> \
  --label samples --label tertiary \
  $(cat tertiary-transformation.js)

Note on Launching Tertiary Workflows

When you launch a tertiary workflow, the UI will automatically launch all of its dependencies. If any of the dependencies fail, the tertiary workflow will not be launched and will also be marked as failed. Tertiary workflows will not be resubmitted if they are already in a COMPLETE state.

When launched, Workbench will wait for all dependencies to complete before launching the tertiary workflow, and it will be kept in the QUEUED state until all dependencies are complete.

PreviousMonitor the Workflow NextEntities

Last updated 2 months ago

Was this helpful?