Samples

Overview

The samples group of commands allows users to manage biological samples in Workbench and sync them from a linked storage account:

Samples Commands

Samples List

List all samples:

omics workbench samples list
  [--max-results]
  [--page]
  [--page-size]
  [--sort]  
  [--platform]
  [--storage-id]

Flag Parameters

Flag

Description

Example

--max-results

Limit the total number of results returned.

--max-results 10

--page

Specify the page of results to retrieve.

--page 2

--page-size

The number of results to return in a single page.

--page-size 50

--sort

Sort results by column and direction (ASC or DESC).

--sort platform:ASC

--platform

Filter results by the sequencing platform associated with the sample.

--platform pacbio

--storage-id

Filter results by the storage account linked to the samples.

--storage-id my-gcp-account

Samples Describe

Describe one or more samples:

omics workbench samples describe SAMPLE_ID [...SAMPLE_ID]

Positional Parameters

Parameter

Description

Example

SAMPLE_ID

The ID of the sample to describe.

sample-123

[...SAMPLE_ID]

Optionally, additional sample IDs to describe.

sample-456

Sample Files List

List the files associated with a given sample across all storage accounts and platforms they were synced from. Optionally filter the results

omics workbench samples files list SAMPLE_ID
  [--max-results]
  [--page]
  [--page-size]
  [--sort]  
  [--platform]
  [--storage]
  [--instrument-id]

Positional Parameters

Parameter

Description

Example

SAMPLE_ID

The ID of the sample to list files for.

sample-123

Flag Parameters

Flag

Description

Example

--max-results

Limit the total number of results returned.

--max-results 10

--page

Specify the page of results to retrieve.

--page 2

--page-size

The number of results to return in a single page.

--page-size 50

--sort

Sort results by column and direction (ASC or DESC).

--sort path:ASC

--platform

Filter results by the sequencing platform associated with the sample.

--platform pacbio

--storage

Filter results by the storage account linked to the sample.

--storage my-gcp-account

--instrument-id

Filter results by the instrument ID associated with the sample.

--instrument-id 80243

--platform-type

File results by the platform type

--platform-type pacbio

--provider

Filter results by the cloud provider

--provider aws

--search

Whole text searching

--search “.bam”

Runs Submit

The runs submit command already exists and provides a powerful mechanism that enables a user to submit one or more workflows to Workbench. This change illustrates how a user can submit a workflow using samples as the sole input to “sample-enabled” workflows.

The following adds a samples flag as a mechanism to support running a workflow with a sample. Specific workflows will be able to support using samples as a mechanism to fill out the inputs without defining it as JSON. If the --samples flag is used on a non-supporting workflow, the user will be presented with a helpful error message.

To ensure this work will enable multiple versions of a single workflow and prevent issues if and when a workflow is updated, a backend component will handle the logic for how samples will be mapped into the inputs.

Data Localization

If the samples flag is used, we can ensure a given set of sample inputs matches the target regions and providers that the engine is running on.

If there are files from multiple regions or providers, only the supported provider/region files will be used. The user can modify this behavior by manually defining the sample inputs.

Defaults

Default values will be handled for specific workflows so that a user does not need to define them. Defaults can be specific to the engine, workflow, provider, or region (or any combination thereof) and will be defined within the backend service.

omics workbench runs submit 
  --url "PacificBiosciences:HiFi-human-assembly-WDL--HiFi-human-assembly-WDL/dockstore" 
  --samples [SAMPLE-NAMES]
  [--engine]

Flag parameters

Flag

Description

Example

--samples

An optional flag that accepts a comma separated list of Sample IDs to use in the given workflow. HG001,HG002,HG003

In order to capture the case where relationships between the proband and the parents need to be preserved, the user can use the father and mother prefix to indicate which samples are the parents. All unprefixed samples are considered to be the children of the the defined parents

This flag is not meant to capture all of the complexities that could possibly occur but at most solve for the trio use case

--samples HG001

--samples HG001,HG002,HGOO3

--samples father:HGOO1,mother:HG002,HG003

--engine

An existing flag that allows the user to define the specific engine they would like to run a workflow on

--engine aws-healthomics-2

Examples

# Simple case mapping a single sample
omics workbench runs submit 
  --url HifiSolves-single-sample/latest
  --samples HG001

# Simple case of mapping multiple samples
omics workbench runs submit 
  --url HifiSolves-multiple-sample/latest
  --samples HG001,HG002,HG003

# Complex case of mapping multiple samples with relationships
omics workbench runs submit 
  --url HifiSolves-single-sample/latest
  --samples father:HG001,mother:HG002,HG003

Last updated