LogoLogo
  • Overview
  • publisher
    • Introduction
    • Getting Started
      • Logging in to Publisher
    • Data Sources
      • Connecting a Data Source
      • Managing a Data Source
      • Connectors
        • AWS S3 Permissions
        • Connecting to AWS S3 Storage
        • Google Cloud Storage (GCS) Permissions
        • Connecting to Google Cloud Storage
        • PostgreSQL Permissions
        • Connecting to PostgreSQL
        • PostgreSQL on Azure Permissions
        • Microsoft Azure Blob Storage Permissions
        • Connecting to Microsoft Azure Blob Storage
        • Connecting to HTTPS
        • Connecting to other sources via Trino
          • BigQuery
    • Collections
      • Creating a Collection
      • Sharing a Collection
      • Collection Filters
      • Editing Collection Metadata
      • Updating Collection Contents
    • Access Policies
      • Creating an Access Policy
      • Managing Access Policies
    • Questions
      • Adding Questions
      • Example Question
    • Settings
      • Viewing Current and Past Administrators
      • Adding an Administrator
      • Removing an Administrator
      • Setting Notification Preferences
  • Explorer
    • Introduction
    • Viewing a Collection
    • Browsing Collections
    • Asking Questions
    • Accessing a Private Collection
      • Requesting Access to a Private Collection
    • Filtering Data in Tables
      • Strings
      • Dates
      • Numbers
  • Workbench
    • Introduction
    • Getting Started
      • Logging into Workbench
      • Connecting an Engine
      • Finding or Importing a Workflow
      • Configuring Workflow Inputs
      • Running and Monitoring a Workflow
      • Locating Outputs
    • Engines
      • Adding and Updating an Engine
        • On AWS HealthOmics
        • On Microsoft Azure
        • On Google Cloud Platform
        • On Premises
      • Parameters
        • AWS HealthOmics
        • Google Cloud Platform
        • Microsoft Azure
        • On-Premises
        • Cromwell
        • Amazon Genomics CLI
    • Workflows
      • Finding Workflows
      • Adding a Workflow
      • Supported Languages
      • Repositories
        • Dockstore
    • Instruments
      • Getting Started with Instruments
      • Connecting a Storage Account
      • Using Sample Data in a Workflow
      • Running Workflows Using Samples
      • Family Based Analysis with Pedigree Information
      • Monitor the Workflow
      • CLI Reference
        • Instruments
        • Storage
        • Samples
        • OpenAPI Specification
    • Entities
    • Terminology
  • Passport
    • Introduction
    • Registering an Email Address for a Google Identity
  • Command Line Interface
    • Installation
    • Usage Examples
    • Working with JSON Data
    • Reference
      • workbench
        • runs submit
        • runs list
        • runs describe
        • runs cancel
        • runs delete
        • runs logs
        • runs tasks list
        • runs events list
        • engines list
        • engines describe
        • engines parameters list
        • engines parameters describe
        • engines health-checks list
        • workflows create
        • workflows list
        • workflows describe
        • workflows update
        • workflows delete
        • workflows versions create
        • workflows versions list
        • workflows versions describe
        • workflows versions files
        • workflows versions update
        • workflows versions delete
        • workflows versions defaults create
        • workflows versions defaults list
        • workflows versions defaults describe
        • workflows versions defaults update
        • workflows versions defaults delete
        • namespaces get-default
        • storage add
        • storage delete
        • storage describe
        • storage list
        • storage update
        • storage platforms add
        • storage platforms delete
        • storage platforms describe
        • storage platforms list
        • samples list
        • samples describe
        • samples files list
      • publisher
        • datasources list
  • Analysis
    • Python Library
    • Popular Environments
      • Cromwell
      • CWL Tool
      • Terra
      • Nextflow
      • DNAnexus
Powered by GitBook

© DNAstack. All rights reserved.

On this page
  • AWS HealthOmics Engine
  • Setting up Your AWS Environment
  • Using the Installer
  • Creating Resources Directly
  • Configuring AWS HealthOmics as an Engine in Workbench
  • Advanced Configuration
  • Installer Reference
  • Configuring Container Images
  • TroubleShooting
  • Workflow fails to submit: ECR Access Denied
  • Image in external account not accessible

Was this helpful?

  1. Workbench
  2. Engines
  3. Adding and Updating an Engine

On AWS HealthOmics

PreviousAdding and Updating an EngineNextOn Microsoft Azure

Last updated 3 months ago

Was this helpful?

AWS HealthOmics Engine

is a cloud-based service that provides a secure, compliant, and scalable platform for analyzing and sharing genomic data. Built on Amazon Web Services (AWS), it helps researchers, clinicians, and stakeholders securely store, analyze, and share genomic data at scale.

This guide explains how to set up AWS HealthOmics as an engine in Workbench.

Setting up Your AWS Environment

Before using AWS HealthOmics in Workbench, you'll need to configure your AWS account with the necessary permissions. HealthOmics requires initial configuration to access your resources. For detailed information, refer to the official AWS HealthOmics .

You can configure AWS HealthOmics in two ways:

  1. Use DNAstack's (recommended)

  2. Create resources manually through the AWS console or

Using the Installer

The installer provides a simple way to create necessary resources.

For complete configuration options, see the .

Clone the installer repository

git clone https://github.com/DNAstack/aws-healthomics-engine-installer.git
cd aws-healthomics-engine-installer

Create a variables.tfvars file

output_bucket_name = "my-healthomics-bucket"
region             = "ap-southeast-1"

Initialize and apply the Terraform environment:

terraform init
terraform apply -var-file=variables.tfvars

Retrieve the output values from the Terraform state. Retain these values for later.

terraform output

Retrieve the sensitive output values from the Terraform state. Retain these values for later and ensure they are kept in a secure location.

terraform output --raw secret_access_key

Creating Resources Directly

{
  "Statement": [
    {
      "Action": "iam:PassRole",
      "Condition": {
        "StringEquals": {
          "iam:PassedToService": "omics.amazonaws.com"
        }
      },
      "Effect": "Allow",
      "Resource": "*",
      "Sid": "AllowPassRole"
    },
    {
      "Action": "omics:*",
      "Effect": "Allow",
      "Resource": "*",
      "Sid": "AllowOmicsActions"
    },
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetObject"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::${bucket_name}/*",
        "arn:aws:s3:::${bucket_name}"
      ],
      "Sid": "AllowS3ReadOnlyAccess"
    },
    {
      "Action": [
        "logs:GetLogEvents",
        "logs:DescribeLogStreams"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:logs:${region}:${account_id}:log-group:/aws/omics/*",
      "Sid": "AllowReadLogs"
    }
  ],
  "Version": "2012-10-17"
}

Configuring AWS HealthOmics as an Engine in Workbench

  • Region: The AWS region where the HealthOmics service is deployed.

  • Access Key ID and Secret Access Key: These are the credentials for the IAM user that has the necessary permissions to access the required AWS services.

  • Output Bucket Name: The name of the S3 bucket where the outputs of the workflows will be stored.

  • Role ARN: The ARN of the IAM role that will be assumed by the HealthOmics service to access the required resources.

If you used the Installer script for setting up HealthOmics, you can retrieve the necessary information by running the terraform output command:

terraform output

Which should return something like the following:

access_key_id = "AK2813KA123MD01"
output_bucket = "my-healthomics-bucket"
role_arn = "arn:aws:iam::123456789:role/HealthOmicsRole"
secret_access_key = <sensitive>

To retrieve the value of the secret access key, you can run:

terraform output --raw secret_access_key
  1. Fill in the additional fields with the information retrieved from the Terraform output.

  2. Click Save to add the engine to your Workbench account.

Advanced Configuration

Installer Reference

The installer is highly configurable and allows you to customize the resources created for HealthOmics. Configuration is done by modifying values in you variables file.

output_bucket_name (required)

Name of the S3 bucket to store output data in without the s3:// prefix. This bucket will be created if it does not already exist.

output_bucket_name = "my-healthomics-bucket"

region (required)

AWS region to create the resources in and run workflows from. Since HealthOmics is region-specific, this should match one of the following regions: us-east-1, us-west-2, ap-southeast-1, eu-central-1, eu-west-1, eu-west-2, il-central-1

region = "ap-southeast-1"

additional_buckets

Name of additional S3 buckets to add permissions to read from for the service role and the generated IAM account. It is assumed that these buckets already exist, and are within the same account and same region as the HealthOmics service.

additional_buckets = ["my-second-bucket", "my-third-bucket"]

workbench_service_account_name

Name of the IAM user that will be generated for Workbench to access the AWS services. This defaults to workbench-health-omics, Change this if you want to use a different name.

workbench_service_account_name = "my-workbench-user"

health_omics_user_policy_name

The name of the policy to attach to the HealthOmics user. This policy will contain all the permissions needed by the generated IAM user to access AWS services. This defaults to HealthOmicsUserPolicy.

health_omics_user_policy_name = "MyHealthOmicsUserPolicy"

health_omics_service_policy_name

The name of the policy to attach to the HealthOmics service. This policy will contain all the permissions needed by HealthOmics to read from S3 buckets and write to CloudWatch. This defaults to HealthOmicsServicePolicy.

health_omics_service_policy_name = "MyHealthOmicsServicePolicy"

health_omics_role_name

The name of the IAM role to create for the HealthOmics service. This defaults to HealthOmicsRole.


health_omics_role_name = "MyHealthOmicsRole"

ecr_repositories

ecr_repositories = ["ubuntu", "my-custom-image"]

external_ecr_accounts

HealthOmics can be configured to read from ECR repositories in other accounts. This is useful if you want to use a central repository for your Docker images. This list should contain the account IDs of the accounts you want to allow HealthOmics to read from.

note:

This will only allow HealthOmics to pull images from the specified accounts. You will still need to configure the policies on the ECR repositories to allow HealthOmics to access them.

external_ecr_accounts = ["123456789012", "2222213132314"]

Configuring Container Images

To upload an image to ECR, you can use the following commands:

aws ecr create-repository --repository-name ubuntu --region ap-southeast-1
aws ecr get-login-password --region ap-southeast-1 | docker login --username AWS --password-stdin 123456789.dkr.ecr.ap-southeast-1.amazonaws.com
docker tag ubuntu:latest 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/ubuntu:latest
docker push 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/ubuntu:latest

Once a repository is created, you will need to ensure HealthOmics has the necessary permissions to access the images.

  1. Ensure you are in the correct region.

  2. Under Private registry, select Repositories.

  3. Select the repository you want to grant access to (ubuntu in the example above).

  4. Select the Permissions option from the sidebar.

  5. Click Edit Policy JSON and paste the following content into the editor, then click save.

    • Note: If there is already a policy, simply append the statements to the existing JSON.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "OmicsAccessPrincipal",
      "Effect": "Allow",
      "Principal": {
        "Service": "omics.amazonaws.com"
      },
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer"
      ]
    }
  ]
}

Whenever you upload a new image to the registry, a separate repository is created for it. For example, if you want to upload a new version of an image, it will be stored in a new repository.

TroubleShooting

Workflow fails to submit: ECR Access Denied

ECR access denied (omics.amazonaws.com): 123124123123123.dkr.ecr.us-east-1.amazonaws.com/dockerhub

When writing a workflow that uses a Docker image stored in ECR, it is common practice to parameterize the image name within the WDL and pass it as an input to the workflow. This allows you to easily switch between different image repositories and help ensure a cloud and region-agnostic workflow.

task echo {
    input {
        String image_repository
    }
    command <<< echo "Hello, World!" >>>
    
    runtime {
        docker: "~{image_repository}/ubuntu:latest"
    }
}
  
workflow say_hello {
  input {
    String image_repository
  }
  call echo {
      input: image_repository = image_repository
  }
}
{
  "inputs": {
    "say_hello.image_repository": "123124123123123.dkr.ecr.us-east-1.amazonaws.com/dockerhub"
  }
}

Unfortunately, this will not work as expected. HealthOmics will perform a preemptive check on any image repository that is passed as an input to the workflow. Since the repository name is actually dockerhub/ubuntu and dockerhub is the namespace the check fails and the workflow submission is rejected.

To fix this, you need to break the input into two fields, one for the namespace and one for the image name. This way, you can combine them in the WDL and pass the full repository name to the runtime.

workflow say_hello {
  input {
    String registry_name
    String? namespace
  }
  
  String image_repository = if defined(namespace) then "${registry_name}/${namespace}" else registry_name 
  
  call echo {
      input: image_repository = immage_repository
  }
}

Image in external account not accessible

 ERROR ECR image 123124123123123.dkr.ecr.us-east-1.amazonaws.com/ubuntu:latest is not accessible, not in the same account

If you are trying to run a workflow with an image in an external account, you may encounter the above error if the permissions are not set correctly WITHIN the current account. You need to explicitly grant the HealthOmics role permissions to submit requests to external ECR repositories in addition to granting the role permissions at the repository level.

The simplest way to do this is to add the external account to the external_ecr_accounts list in the installer script and re-run the installer. This will add the necessary permissions to the HealthOmics role to access the external ECR repositories.

If you prefer to use the AWS console or the AWS CLI to create the necessary resources, the easiest way to get started is to follow the AWS HealthOmics . In addition to the resources required by HealthOmics, Workbench requires an access key and secret for an IAM User that has the necessary permissions to access the following services:

AWS HealthOmics Engine

You can create an IAM user by following the and the Access Key and Secret Key by following the . Once you have created the user, you will need to assign the necessary permissions to the user using an IAM policy similar to the following:

Once you have set up your AWS environment and configured the necessary permissions, you can add AWS HealthOmics as an engine to your Workbench account. To do this, you will need to provide the following information (in addition to the ):

From the page, ensure all of the general fields are filled out as per the .

A list of ECR repository names to create and attach the appropriate IAM policies to. This is useful if you want to use HealthOmics to run workflows that require Docker images stored in ECR. For more information on configuring ECR repositories, and uploading images, see the section.

HealthOmics uses container images to run workflows. All images must be located within a private container registry within the same region as the HealthOmics service. You can use the to store your images.

Go to the AWS Console and navigate to the .

If you do not want to manually upload images to ECR, you can use the provided by the HealthOmics team, which allows you to copy images from a public registry to your private ECR.

If you are using the to copy images or a to sync images from upstream repositories, ECR adds a namespace prefix to the image name. For example, If you want to configure a pull through cache of dockerhub to pull the ubuntu:latest image, your input to the above workflow may look like the following:

AWS HealthOmics
user guide
terraform engine installer
CLI
installer README
Getting Started Guide
CloudWatch
S3
Managing Users guide
AWS Account and Access Keys guide
usual configuration
AWS HealthOmics Engine Configuration
usual configuration
AWS Elastic Container Registry (ECR)
ECR service
Amazon ECR Helper
Amazon ECR Helper
pull through cache
https://aws.amazon.com/healthomics/
HealthOmics
Configuring Container Images