On Google Cloud Storage

Understanding Cromwell on Google Cloud

Cromwell is the Broad Institute's open-source workflow execution engine, designed to run Workflow Description Language (WDL) workflows on local or cloud infrastructure. Google Cloud provides the Cloud Life Sciences API, which:

  • Manages, processes, and transforms life sciences data

  • Orchestrates task execution on Google Compute Engine instances

  • Was formerly known as the Pipelines API

How It Works

When using Cromwell on Google Cloud with Workbench, the process flows as follows:

  1. Workbench submits the workflow to Cromwell

  2. Cromwell generates individual task definitions

  3. Cromwell dispatches tasks to Google Compute Engine via Cloud Life Sciences API

  4. Tasks execute on Google Compute Engine instances using specified containers and resources

  5. Outputs write to Google Cloud Storage

  6. Cromwell returns results to Workbench

Workbench monitors and reports status throughout this process, as detailed in the User Guide.

Deployment

For simplified setup, we provide a Terraform-based installer script that supports both single-project and multi-project architectures.

The deployment uses Cloud Run as the ingress point for Cromwell requests, as the Cromwell server isn't directly internet-accessible. Requests require a GCP identity token from an authorized principal (user or service account).

For complete deployment instructions, refer to the installer README.

Last updated