LogoLogo
  • Overview
  • publisher
    • Introduction
    • Getting Started
      • Logging in to Publisher
    • Data Sources
      • Connecting a Data Source
      • Managing a Data Source
      • Connectors
        • AWS S3 Permissions
        • Connecting to AWS S3 Storage
        • Google Cloud Storage (GCS) Permissions
        • Connecting to Google Cloud Storage
        • PostgreSQL Permissions
        • Connecting to PostgreSQL
        • PostgreSQL on Azure Permissions
        • Microsoft Azure Blob Storage Permissions
        • Connecting to Microsoft Azure Blob Storage
        • Connecting to HTTPS
        • Connecting to other sources via Trino
          • BigQuery
    • Collections
      • Creating a Collection
      • Sharing a Collection
      • Collection Filters
      • Editing Collection Metadata
      • Updating Collection Contents
    • Access Policies
      • Creating an Access Policy
      • Managing Access Policies
    • Questions
      • Adding Questions
      • Example Question
    • Settings
      • Viewing Current and Past Administrators
      • Adding an Administrator
      • Removing an Administrator
      • Setting Notification Preferences
  • Explorer
    • Introduction
    • Viewing a Collection
    • Browsing Collections
    • Asking Questions
    • Accessing a Private Collection
      • Requesting Access to a Private Collection
    • Filtering Data in Tables
      • Strings
      • Dates
      • Numbers
  • Workbench
    • Introduction
    • Getting Started
      • Logging into Workbench
      • Connecting an Engine
      • Finding or Importing a Workflow
      • Configuring Workflow Inputs
      • Running and Monitoring a Workflow
      • Locating Outputs
    • Engines
      • Adding and Updating an Engine
        • On AWS HealthOmics
        • On Microsoft Azure
        • On Google Cloud Platform
        • On Premises
      • Parameters
        • AWS HealthOmics
        • Google Cloud Platform
        • Microsoft Azure
        • On-Premises
        • Cromwell
        • Amazon Genomics CLI
    • Workflows
      • Finding Workflows
      • Adding a Workflow
      • Supported Languages
      • Repositories
        • Dockstore
    • Instruments
      • Getting Started with Instruments
      • Connecting a Storage Account
      • Using Sample Data in a Workflow
      • Running Workflows Using Samples
      • Family Based Analysis with Pedigree Information
      • Monitor the Workflow
      • CLI Reference
        • Instruments
        • Storage
        • Samples
        • OpenAPI Specification
    • Entities
    • Terminology
  • Passport
    • Introduction
    • Registering an Email Address for a Google Identity
  • Command Line Interface
    • Installation
    • Usage Examples
    • Working with JSON Data
    • Reference
      • workbench
        • runs submit
        • runs list
        • runs describe
        • runs cancel
        • runs delete
        • runs logs
        • runs tasks list
        • runs events list
        • engines list
        • engines describe
        • engines parameters list
        • engines parameters describe
        • engines health-checks list
        • workflows create
        • workflows list
        • workflows describe
        • workflows update
        • workflows delete
        • workflows versions create
        • workflows versions list
        • workflows versions describe
        • workflows versions files
        • workflows versions update
        • workflows versions delete
        • workflows versions defaults create
        • workflows versions defaults list
        • workflows versions defaults describe
        • workflows versions defaults update
        • workflows versions defaults delete
        • namespaces get-default
        • storage add
        • storage delete
        • storage describe
        • storage list
        • storage update
        • storage platforms add
        • storage platforms delete
        • storage platforms describe
        • storage platforms list
        • samples list
        • samples describe
        • samples files list
      • publisher
        • datasources list
  • Analysis
    • Python Library
    • Popular Environments
      • Cromwell
      • CWL Tool
      • Terra
      • Nextflow
      • DNAnexus
Powered by GitBook

© DNAstack. All rights reserved.

On this page
  • Configuring Access for AWS
  • 1. Modify the additional_buckets Variable:
  • 2. Apply the Terraform Configuration:
  • 3. Verify Permissions:
  • Configuring Access for GCP
  • 1. Add the Additional Buckets:
  • 3. Apply the Terraform Configuration:
  • 4. Verify Configuration:
  • Configuring Access for Azure
  • If the Container is Not Public
  • If the Container is Public

Was this helpful?

  1. Workbench
  2. Instruments

Using Sample Data in a Workflow

PreviousConnecting a Storage AccountNextRunning Workflows Using Samples

Last updated 2 months ago

Was this helpful?

Connecting sample data to Workbench allows it to be discovered and indexed; however, this alone does not provide the underlying engine with the necessary permissions to access and use the data during workflows. To use sample data within a workflow, ensure that your cloud environment’s engine has the appropriate permissions to access the storage resources.

Configuring Access for AWS

For AWS environments, use the Workbench engine installer to grant access to additional buckets:

1. Modify the additional_buckets Variable:

  • Add the name of any additional buckets that the engine needs to access.

For example:

additional_buckets = [
  "my-genomics-bucket"
]

2. Apply the Terraform Configuration:

  • Run terraform plan to preview changes.

  • Use terraform apply to update the IAM policies and grant the engine permissions to access the specified buckets.

3. Verify Permissions:

  • Ensure that the engine can list and read objects from the new buckets by testing access through a Workbench workflow.

Configuring Access for GCP

1. Add the Additional Buckets:

  • Update the Terraform configuration to include the new bucket in the IAM policy bindings. For example:

additional_buckets = [
    "my-additional-bucket"
]

3. Apply the Terraform Configuration:

  • Run terraform plan to review changes.

  • Execute terraform apply to update permissions.

4. Verify Configuration:

  • Test the engine’s ability to access the additional bucket by running a workflow in Workbench.

Configuring Access for Azure

For Azure environments, CromwellOnAzure is used to configure access to additional storage accounts or containers. Follow these steps based on your access requirements (i.e. if the container is not public vs public):

If the Container is Not Public

  1. Generate a SAS Token:

    • Obtain a SAS token for the desired container (read-only or read-write based on usage requirements).

    • Follow the official Azure documentation for generating an SAS URL.

    • Copy the SAS token for use in the next step.

  2. Update the aksValues.yaml File:

    • Navigate to the configuration container in the default storage account linked to your CromwellOnAzure installation.

    • The storage account will be located within the designated resource group.

    • Locate the aksValues.yaml file in the container and click on the three ellipses (...) at the end of the row.

    • Select View/Edit from the menu.

    • Add a YAML block to the file in the following format, replacing placeholders with actual values:

externalSasContainers:
- accountName: <StorageAccountToConnect>
  containerName: <ContainerName>
  sasToken: <SAS Token>
  • The SAS token should look similar to:

si=public&spr=https&sv=2021-06-08&sr=c&sig=o6OkcqvWlGcGOOr8I8gCA%2BJwlpA%2vYshz0DpB8CCtCJk%3D
  • Click Save once you have finished editing the file.

  1. Run the CromwellOnAzure Update:

    • Ensure that the Azure CLI is installed.

    • Download the latest CromwellOnAzure binary client.

    • Retrieve the Subscription ID and Resource Group Name used when installing CromwellOnAzure.

    • Run the update command to apply the new values:

./deploy-cromwell-on-azure-linux --SubscriptionId 
"<SubscriptionId>" --HelmBinaryPath /usr/bin/helm 
--ResourceGroupName <ResourceGroupName> --update true

If the Container is Public

  • If the containers you wish to connect to are public, no additional configuration is needed.

  • CromwellOnAzure can read directly from public containers using their HTTPS URIs.

  • Note: Attempting to configure a public bucket using the SAS token method may prevent Cromwell from reading files correctly.

For GCP environments, you can use the Workbench engine installer to grant access to additional buckets

Terraform module
Terraform modules