BigQuery

Overview

This guide describes how to configure a BigQuery data source in Publisher and how to set up BigQuery permissions correctly for both specific dataset access and project-wide configurations, explaining the core concepts and minimum permissions required for each scenario.

Data source configuration

From the Publisher interface, select Data Sources in the navigation bar and click Connect Data Source. Choose Trino from the available connectors.

In the Data Source Configuration screen, set up the Catalog properties fields. Select the configuration that matches your currently running Trino version:

connector.name=enhanced_bigquery
bigquery.project-id=<data_project_id>
bigquery.parent-project-id=<billing_project_id>
bigquery.credentials-key=<base64_encoded_credentials_key>
bigquery.views-enabled=false
bigquery.parallelism=16
# enhanced_bigquery-specific property
enhanced-bigquery.include-datasets=dataset_1 + dataset_2

There is no cost-effective row count query available. If you still want to collect a row count please use:

SELECT COUNT(*) AS row_count FROM "{{catalog}}"."{{schema}}"."{{table}}"

This query will incur high costs on large datasets

connector.name=enhanced_bigquery
bigquery.project-id=<data_project_id>
bigquery.parent-project-id=<billing_project_id>
bigquery.credentials-key=<base64_encoded_credentials_key>
bigquery.views-enabled=false
bigquery.max-parallelism=16
# enhanced_bigquery-specific property
enhanced-bigquery.include-datasets=dataset_1 + dataset_2

Here is an example for a cost-effective row count query

SELECT * FROM TABLE(
   "{{catalog}}".system.query(
    query => CONCAT('SELECT row_count FROM `project-id.','{{schema}}','`.__TABLES__ WHERE table_id=','''','{{table}}','''')
  )
)

The project-id in the query should be replaced with the same value you used for bigquery.project-id property.

When your dataset contains table names with uppercase letters, enable case-insensitive name matching by setting bigquery.case-insensitive-name-matching=true. Your row count queries must convert table names to lowercase for proper matching. Ensure all table names are unique when compared case-insensitively. Here is the query for a dataset with case-insensitive table names:

SELECT * FROM TABLE(
   "{{catalog}}".system.query(
    query => CONCAT('SELECT row_count FROM `project-id.','{{schema}}','`.__TABLES__ WHERE lower(table_id)=','''','{{table}}','''')
  )
)

The Trino BigQuery documentation documents the properties above. Custom properties provided by the enhanced_bigquery are described below:
- Set enhanced-bigquery.include-datasets to the dataset(s) that should be included. Multiple datasets can be concatenated using +:
  enhanced-bigquery.include-datasets=dataset_1 + dataset_2
After downloading the JSON key you created in the Creating Credentials steps, open a new terminal window and run the command below to base64 encode the key. You can paste the contents of the output file as the value for the bigquery.credentials-key property.

certutil -encode <input_file> <output_file>

BigQuery Permissions

Concepts

Service Account (SA): This is an identity generated in Google Cloud that can be used to interact with GCP services. Each SA has a unique email address (identity) and one or more JSON keys.
Data Project: A project that houses the target dataset.
Quota project: A project that you would like to bill BigQuery to and consume quota resources.
Permission: Granular actions that a user can perform on a given resource.
Role: A collection of permissions typically needed for specific interactions against one or more resources: “Data Viewer,” “Storage Reader,” etc.

Minimal Configuration

This setup represents the minimal permissions needed to interact with a SPECIFIC BigQuery Data Source without creating custom roles. It conceptually segregates the Quota Project and the Data Project, but they can be the same project.

Creating Credentials

Create a new Service Account (SA) in any project.

For more information on creating a new service account within GCP, please refer to Google's IAM Guide.

gcloud iam service-accounts create publisher-connectors \ 
    --display-name="Publisher Data Connections"

Generate a new JSON key and download it.

gcloud iam service-accounts keys create \
  publisher-connectors-key.json \
--iam-account=publisher-connectors@${PROJECT}.iam.gserviceaccount.com

Adding Permissions to Quota Project

Navigate to the quota project's IAM and Admin Page.
Grant the following Roles to the SA you created:
1. Service Usage Consumer — This allows the SA to consume resources.
2. BigQuery Read Session User — This allows the SA to start a BigQuery read session with the storage API. Without this, they can interact with BQ via the normal Query interface but cannot list the table's contents through our DLCON (although they will get the metadata).
3. BigQuery Job User — This allows the user to run jobs.

Data Project

Navigate to the BigQuery console.
Open the dataset you want to grant access to.
Click on “Sharing." then click on “Permissions.”
Add the SA from the "Creating Credentials" step above and grant it the following role.
1. BigQuery Data Viewer — This grants the SA the ability to read metadata and table data from any table in this dataset.

Generalized Project Configuration

This setup represents the permissions needed to expose all BigQuery DataSets in a project.

Creating Credentials

Create a new Service Account (SA) in any project.
Generate a new JSON key and download it.

Adding Permissions to Project

Navigate to the quota project's IAM and Admin Page.
Grant the following Roles to the SA you created:
1. Service Usage Consumer — This allows the SA to consume resources.
2. BigQuery Read Session User — This allows the SA to start a BigQuery read session with the storage API. Without this, they can interact with BQ via the normal Query interface but cannot list the table's contents through our DLCON (although they will get the metadata).
3. BigQuery Job User — This allows the user to run jobs.
4. BigQuery Data Viewer — This grants the SA the ability to read metadata and table data from any table in this dataset.

PreviousConnecting to other sources via Trino NextCollections

Last updated 4 months ago

Was this helpful?