LogoLogo
  • Overview
  • publisher
    • Introduction
    • Getting Started
      • Logging in to Publisher
    • Data Sources
      • Connecting a Data Source
      • Managing a Data Source
      • Connectors
        • AWS S3 Permissions
        • Connecting to AWS S3 Storage
        • Google Cloud Storage (GCS) Permissions
        • Connecting to Google Cloud Storage
        • PostgreSQL Permissions
        • Connecting to PostgreSQL
        • PostgreSQL on Azure Permissions
        • Microsoft Azure Blob Storage Permissions
        • Connecting to Microsoft Azure Blob Storage
        • Connecting to HTTPS
        • Connecting to other sources via Trino
          • BigQuery
    • Collections
      • Creating a Collection
      • Sharing a Collection
      • Collection Filters
      • Editing Collection Metadata
      • Updating Collection Contents
    • Access Policies
      • Creating an Access Policy
      • Managing Access Policies
    • Questions
      • Adding Questions
      • Example Question
    • Settings
      • Viewing Current and Past Administrators
      • Adding an Administrator
      • Removing an Administrator
      • Setting Notification Preferences
  • Explorer
    • Introduction
    • Viewing a Collection
    • Browsing Collections
    • Asking Questions
    • Accessing a Private Collection
      • Requesting Access to a Private Collection
    • Filtering Data in Tables
      • Strings
      • Dates
      • Numbers
  • Workbench
    • Introduction
    • Getting Started
      • Logging into Workbench
      • Connecting an Engine
      • Finding or Importing a Workflow
      • Configuring Workflow Inputs
      • Running and Monitoring a Workflow
      • Locating Outputs
    • Engines
      • Adding and Updating an Engine
        • On AWS HealthOmics
        • On Microsoft Azure
        • On Google Cloud Platform
        • On Premises
      • Parameters
        • AWS HealthOmics
        • Google Cloud Platform
        • Microsoft Azure
        • On-Premises
        • Cromwell
        • Amazon Genomics CLI
    • Workflows
      • Finding Workflows
      • Adding a Workflow
      • Supported Languages
      • Repositories
        • Dockstore
    • Instruments
      • Getting Started with Instruments
      • Connecting a Storage Account
      • Using Sample Data in a Workflow
      • Running Workflows Using Samples
      • Family Based Analysis with Pedigree Information
      • Monitor the Workflow
      • CLI Reference
        • Instruments
        • Storage
        • Samples
        • OpenAPI Specification
    • Entities
    • Terminology
  • Passport
    • Introduction
    • Registering an Email Address for a Google Identity
  • Command Line Interface
    • Installation
    • Usage Examples
    • Working with JSON Data
    • Reference
      • workbench
        • runs submit
        • runs list
        • runs describe
        • runs cancel
        • runs delete
        • runs logs
        • runs tasks list
        • runs events list
        • engines list
        • engines describe
        • engines parameters list
        • engines parameters describe
        • engines health-checks list
        • workflows create
        • workflows list
        • workflows describe
        • workflows update
        • workflows delete
        • workflows versions create
        • workflows versions list
        • workflows versions describe
        • workflows versions files
        • workflows versions update
        • workflows versions delete
        • workflows versions defaults create
        • workflows versions defaults list
        • workflows versions defaults describe
        • workflows versions defaults update
        • workflows versions defaults delete
        • namespaces get-default
        • storage add
        • storage delete
        • storage describe
        • storage list
        • storage update
        • storage platforms add
        • storage platforms delete
        • storage platforms describe
        • storage platforms list
        • samples list
        • samples describe
        • samples files list
      • publisher
        • datasources list
  • Analysis
    • Python Library
    • Popular Environments
      • Cromwell
      • CWL Tool
      • Terra
      • Nextflow
      • DNAnexus
Powered by GitBook

© DNAstack. All rights reserved.

On this page
  • Overview
  • Data source configuration
  • BigQuery Permissions
  • Concepts
  • Minimal Configuration
  • Generalized Project Configuration

Was this helpful?

  1. publisher
  2. Data Sources
  3. Connectors
  4. Connecting to other sources via Trino

BigQuery

PreviousConnecting to other sources via TrinoNextCollections

Last updated 23 days ago

Was this helpful?

Overview

This guide describes how to configure a BigQuery data source in Publisher and how to set up BigQuery permissions correctly for both specific dataset access and project-wide configurations, explaining the core concepts and minimum permissions required for each scenario.

Data source configuration

From the Publisher interface, select Data Sources in the navigation bar and click Connect Data Source. Choose Trino from the available connectors.

In the Data Source Configuration screen set up the Catalog properties field. Here's an example:

connector.name=enhanced_bigquery
bigquery.project-id=<data_project_id>
bigquery.parent-project-id=<billing_project_id>
bigquery.credentials-key=<base64_encoded_credentials_key>
bigquery.views-enabled=false
bigquery.parallelism=16
# enhanced_bigquery-specific property
enhanced-bigquery.include-datasets=dataset_1 + dataset_2
  • The documents the properties above. Custom properties provided by the enhanced_bigquery are described below:

    • Set enhanced-bigquery.include-datasets to the dataset(s) that should be included. Multiple datasets can be concatenated using +:

      enhanced-bigquery.include-datasets=dataset_1 + dataset_2
  • After downloading the JSON key you created in the Creating Credentials steps, open a new terminal window and run the command below to base 64 encode the key. You can paste the contents of the output file as the value for the bigquery.credentials-key property.

certutil -encode <input_file> <output_file>
base64 -i <path_to_service_account_key.json> -o <output_file>
base64 <path_to_service_account_key.json> > <output_file>

BigQuery Permissions

Concepts

  • Service Account (SA): This is an identity generated in Google Cloud that can be used to interact with GCP services. Each SA has a unique email address (identity) and one or more JSON keys.

  • Data Project: A project that houses the target dataset.

  • Quota project: A project that you would like to bill BigQuery to and consume quota resources.

  • Permission: Granular actions that a user can perform on a given resource.

  • Role: A collection of permissions typically needed for specific interactions against one or more resources: “Data Viewer,” “Storage Reader,” etc.

Minimal Configuration

This setup represents the minimal permissions needed to interact with a SPECIFIC BigQuery Data Source without creating custom roles. It conceptually segregates the Quota Project and the Data Project, but they can be the same project.

Creating Credentials

Create a new Service Account (SA) in any project.

gcloud iam service-accounts create publisher-connectors \ 
    --display-name="Publisher Data Connections"

Generate a new JSON key and download it.

gcloud iam service-accounts keys create \
  publisher-connectors-key.json \
--iam-account=publisher-connectors@${PROJECT}.iam.gserviceaccount.com

Adding Permissions to Quota Project

  1. Navigate to the quota project's IAM and Admin Page.

  2. Grant the following Roles to the SA you created:

    1. Service Usage Consumer — This allows the SA to consume resources.

    2. BigQuery Read Session User — This allows the SA to start a BigQuery read session with the storage API. Without this, they can interact with BQ via the normal Query interface but cannot list the table's contents through our DLCON (although they will get the metadata).

    3. BigQuery Job User — This allows the user to run jobs.

Data Project

  1. Navigate to the BigQuery console.

  2. Open the dataset you want to grant access to.

  3. Click on “Sharing." then click on “Permissions.”

  4. Add the SA from the "Creating Credentials" step above and grant it the following role.

    1. BigQuery Data Viewer — This grants the SA the ability to read metadata and table data from any table in this dataset.

Generalized Project Configuration

This setup represents the permissions needed to expose all BigQuery DataSets in a project.

Creating Credentials

  1. Create a new Service Account (SA) in any project.

  2. Generate a new JSON key and download it.

Adding Permissions to Project

  1. Navigate to the quota project's IAM and Admin Page.

  2. Grant the following Roles to the SA you created:

    1. Service Usage Consumer — This allows the SA to consume resources.

    2. BigQuery Read Session User — This allows the SA to start a BigQuery read session with the storage API. Without this, they can interact with BQ via the normal Query interface but cannot list the table's contents through our DLCON (although they will get the metadata).

    3. BigQuery Job User — This allows the user to run jobs.

    4. BigQuery Data Viewer — This grants the SA the ability to read metadata and table data from any table in this dataset.

For more information on creating a new service account within GCP, please refer to Google's .

Trino BigQuery documentation
IAM Guide