# BigQuery

## Overview

This guide describes how to configure a BigQuery data source in Publisher and how to set up BigQuery permissions correctly for both specific dataset access and project-wide configurations, explaining the core concepts and minimum permissions required for each scenario.

## Data source configuration

From the Publisher interface, select ***Data Sources*** in the navigation bar and click ***Connect Data Source***. Choose ***Trino*** from the available connectors.

In the **Data Source Configuration** screen, set up the **Catalog properties** fields. Select the configuration that matches your currently running Trino version:

{% tabs %}
{% tab title="Trino Version 445" %}

```
connector.name=enhanced_bigquery
bigquery.project-id=<data_project_id>
bigquery.parent-project-id=<billing_project_id>
bigquery.credentials-key=<base64_encoded_credentials_key>
bigquery.views-enabled=false
bigquery.parallelism=16
# enhanced_bigquery-specific property
enhanced-bigquery.include-datasets=dataset_1 + dataset_2
```

There is no cost-effective row count query available. If you still want to collect a row count please use:

```sql
SELECT COUNT(*) AS row_count FROM "{{catalog}}"."{{schema}}"."{{table}}"
```

{% hint style="warning" %}
This query will incur high costs on large datasets
{% endhint %}
{% endtab %}

{% tab title="Trino Version 475" %}

```
connector.name=enhanced_bigquery
bigquery.project-id=<data_project_id>
bigquery.parent-project-id=<billing_project_id>
bigquery.credentials-key=<base64_encoded_credentials_key>
bigquery.views-enabled=false
bigquery.max-parallelism=16
# enhanced_bigquery-specific property
enhanced-bigquery.include-datasets=dataset_1 + dataset_2
```

Here is an example for a cost-effective row count query

```sql
SELECT * FROM TABLE(
   "{{catalog}}".system.query(
    query => CONCAT('SELECT row_count FROM `project-id.','{{schema}}','`.__TABLES__ WHERE table_id=','''','{{table}}','''')
  )
)
```

{% hint style="info" %}
The `project-id` in the query should be replaced with the same value you used for `bigquery.project-id` property.
{% endhint %}

When your dataset contains table names with uppercase letters, enable case-insensitive name matching by setting `bigquery.case-insensitive-name-matching=true`. Your row count queries must convert table names to lowercase for proper matching. Ensure all table names are unique when compared case-insensitively. Here is the query for a dataset with case-insensitive table names:

```sql
SELECT * FROM TABLE(
   "{{catalog}}".system.query(
    query => CONCAT('SELECT row_count FROM `project-id.','{{schema}}','`.__TABLES__ WHERE lower(table_id)=','''','{{table}}','''')
  )
)
```

{% endtab %}
{% endtabs %}

* The [Trino BigQuery documentation](https://trino.io/docs/445/connector/bigquery.html) documents the properties above. Custom properties provided by the `enhanced_bigquery` are described below:
  * Set `enhanced-bigquery.include-datasets` to the dataset(s) that should be included. Multiple datasets can be concatenated using `+`:

    ```
    enhanced-bigquery.include-datasets=dataset_1 + dataset_2
    ```
* After downloading the JSON key you created in the [#creating-credentials](#creating-credentials "mention") steps, open a new terminal window and run the command below to base64 encode the key. You can paste the contents of the output file as the value for the `bigquery.credentials-key` property.

{% tabs %}
{% tab title="Windows" %}

```
certutil -encode <input_file> <output_file>
```

{% endtab %}

{% tab title="macOS" %}

```
base64 -i <path_to_service_account_key.json> -o <output_file>
```

{% endtab %}

{% tab title="Linux" %}

```
base64 <path_to_service_account_key.json> > <output_file>
```

{% endtab %}
{% endtabs %}

## BigQuery Permissions

### Concepts

* **Service Account (SA):** This is an identity generated in Google Cloud that can be used to interact with GCP services. Each SA has a unique email address (identity) and one or more JSON keys.
* **Data Project**: A project that houses the target dataset.
* **Quota projec**t: A project that you would like to bill BigQuery to and consume quota resources.
* **Permission**: Granular actions that a user can perform on a given resource.
* **Role:** A collection of permissions typically needed for specific interactions against one or more resources: “Data Viewer,” “Storage Reader,” etc.

### Minimal Configuration

{% hint style="info" %}
This setup represents the minimal permissions needed to interact with a **SPECIFIC** BigQuery Data Source without creating custom roles. It conceptually segregates the Quota Project and the Data Project, but they can be the same project.
{% endhint %}

#### Creating Credentials

Create a new Service Account (SA) in any project.

{% hint style="info" %}
For more information on creating a new service account within GCP, please refer to Google's [IAM Guide](https://cloud.google.com/iam/docs/service-accounts-create).
{% endhint %}

```
gcloud iam service-accounts create publisher-connectors \ 
    --display-name="Publisher Data Connections"
```

Generate a new JSON key and download it.

```
gcloud iam service-accounts keys create \
  publisher-connectors-key.json \
--iam-account=publisher-connectors@${PROJECT}.iam.gserviceaccount.com
```

#### Adding Permissions to Quota Project

1. Navigate to the quota project's IAM and Admin Page.
2. Grant the following Roles to the SA you created:
   1. ***Service Usage Consumer*** — This allows the SA to consume resources.
   2. ***BigQuery Read Session User*** — This allows the SA to start a BigQuery read session with the storage API. Without this, they can interact with BQ via the normal Query interface but cannot list the table's contents through our DLCON (although they will get the metadata).
   3. ***BigQuery Job User*** — This allows the user to run jobs.

#### Data Project

1. Navigate to the BigQuery console.
2. Open the dataset you want to grant access to.
3. Click on “Sharing." then click on “Permissions.”
4. Add the SA from the "**Creating Credentials**" step above and grant it the following role.
   1. ***BigQuery Data Viewer*** — This grants the SA the ability to read metadata and table data from any table in this dataset.

### Generalized Project Configuration

{% hint style="info" %}
This setup represents the permissions needed to expose all BigQuery DataSets in a project.
{% endhint %}

#### Creating Credentials

1. Create a new Service Account (SA) in any project.
2. Generate a new JSON key and download it.

#### Adding Permissions to Project

1. Navigate to the quota project's IAM and Admin Page.
2. Grant the following Roles to the SA you created:
   1. ***Service Usage Consumer*** — This allows the SA to consume resources.
   2. ***BigQuery Read Session User*** — This allows the SA to start a BigQuery read session with the storage API. Without this, they can interact with BQ via the normal Query interface but cannot list the table's contents through our DLCON (although they will get the metadata).
   3. ***BigQuery Job User*** — This allows the user to run jobs.
   4. ***BigQuery Data Viewer*** — This grants the SA the ability to read metadata and table data from any table in this dataset.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.omics.ai/products/publisher/data-sources/connectors/trino-connectors/bigquery-permissions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
