# Using Sample Data in a Workflow

Connecting sample data to Workbench allows it to be discovered and indexed; however, this alone does not provide the underlying engine with the necessary permissions to access and use the data during workflows. To use sample data within a workflow, ensure that your cloud environment’s engine has the appropriate permissions to access the storage resources.

## Configuring Access for AWS

For AWS environments, use the Workbench engine installer [Terraform module](https://github.com/DNAstack/aws-healthomics-engine-installer) to grant access to additional buckets:

### **1. Modify the additional\_buckets Variable:**

* Add the name of any additional buckets that the engine needs to access.

For example:

```
additional_buckets = [
  "my-genomics-bucket"
]
```

### 2. Apply the Terraform Configuration:

* Run <mark style="color:green;">`terraform plan`</mark> to preview changes.
* Use <mark style="color:green;">`terraform apply`</mark> to update the IAM policies and grant the engine permissions to access the specified buckets.

### 3. Verify Permissions:

* Ensure that the engine can list and read objects from the new buckets by testing access through a Workbench workflow.

## Configuring Access for GCP

For GCP environments, you can use the Workbench engine installer [Terraform modules](https://github.com/DNAstack/cromwell-on-gcp-workbench-engine-installer) to grant access to additional buckets

### **1. Add the Additional Buckets**:

* Update the Terraform configuration to include the new bucket in the IAM policy bindings. For example:

```
additional_buckets = [
    "my-additional-bucket"
]
```

### **3. Apply the Terraform Configuration**:

* Run <mark style="color:green;">`terraform plan`</mark> to review changes.
* Execute <mark style="color:green;">`terraform apply`</mark> to update permissions.

### 4. Verify Configuration:

* Test the engine’s ability to access the additional bucket by running a workflow in Workbench.

## Configuring Access for Azure

For Azure environments, CromwellOnAzure is used to configure access to additional storage accounts or containers. Follow these steps based on your access requirements (i.e. if the container is not public vs public):

### **If the Container is Not Public**

1. **Generate a SAS Token**:
   * Obtain a SAS token for the desired container (read-only or read-write based on usage requirements).
   * Follow the official Azure documentation for generating an SAS URL.
   * Copy the SAS token for use in the next step.
2. **Update the** <mark style="color:green;">`aksValues.yaml`</mark> **File**:
   * Navigate to the **configuration** container in the default storage account linked to your CromwellOnAzure installation.
   * The storage account will be located within the designated **resource group**.
   * Locate the <mark style="color:green;">`aksValues.yaml`</mark> file in the container and click on the three ellipses (...) at the end of the row.
   * Select **View/Edit** from the menu.
   * Add a YAML block to the file in the following format, replacing placeholders with actual values:

```
externalSasContainers:
- accountName: <StorageAccountToConnect>
  containerName: <ContainerName>
  sasToken: <SAS Token>
```

* **The SAS token should look similar to:**

```
si=public&spr=https&sv=2021-06-08&sr=c&sig=o6OkcqvWlGcGOOr8I8gCA%2BJwlpA%2vYshz0DpB8CCtCJk%3D
```

* Click **Save** once you have finished editing the file.

3. **Run the CromwellOnAzure Update:**
   * Ensure that the **Azure CLI** is installed.
   * Download the latest **CromwellOnAzure** binary client.
   * Retrieve the **Subscription ID** and **Resource Group Name** used when installing CromwellOnAzure.
   * Run the update command to apply the new values:

```
./deploy-cromwell-on-azure-linux --SubscriptionId 
"<SubscriptionId>" --HelmBinaryPath /usr/bin/helm 
--ResourceGroupName <ResourceGroupName> --update true
```

### **If the Container is Public**

* If the containers you wish to connect to are **public**, no additional configuration is needed.
* CromwellOnAzure can read directly from public containers using their HTTPS URIs.
* **Note**: Attempting to configure a public bucket using the SAS token method may prevent Cromwell from reading files correctly.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.omics.ai/products/workbench/instruments/using-sample-data-in-a-workflow.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
