Connecting to other sources via Trino

Publisher can connect to additional data sources via Trino. This guide explains how to configure Trino connectors in Publisher.

Configuration Process

Initial Setup

From the Publisher interface, select "Data Sources" in the navigation bar and click "Connect Data Source". Choose "Trino" from the available connectors.

Catalog Properties

In Trino, the catalog properties define how Trino connects to your data source. A catalog contains schemas and references a data source through a connector, forming the foundation of your data access configuration.

When you configure a Trino data source, you need to specify appropriate catalog properties based on the type of data source you're connecting to. These properties typically include:

Connection details (hostname, port)
Authentication credentials
Schema information
Performance settings
Security configurations

Configuration Instructions

For the connector types below, refer to the relevant instructions page:
- BigQuery
For all other connector types, refer to the Trino Connectors documentation for your specific connector.

When you connect tables via Trino in Publisher, they initially appear in your Library collection. From there, you can add these tables to one or more Collections. To access tables in these Collections, use the fully-qualified name format: collections.{collection_slug_name}.{table_name}

Row count query

The Row count query field allows you to specify a SQL query that counts the number of rows in a table. You can use template placeholders in your query, which will be automatically replaced with the appropriate values when the query is executed. Placeholders are surrounded by double braces ({{ and }}).

Available placeholders:

{{catalog}} - The catalog name
{{schema}} - The schema name
{{table}} - The table name

Example Queries

In every query must be a column row_count be present. You can use placeholders as identifiers:

SELECT COUNT(*) AS row_count FROM "{{catalog}}"."{{schema}}"."{{table}}"

Or as string literals in more complex queries:

SELECT row_count FROM project-id.'{{schema}}'.__TABLES__ WHERE table_name='{{table}}'

The Row count query will be executed frequently to provide up-to-date row counts for tables in your data source. As the data source owner, you are responsible for monitoring the cost and performance impact of the query.

While the standard COUNT(*) query works for most data sources, it is recommended to use a more efficient query to retrieve row counts when available, as this can significantly reduce resource usage and costs.

Permissions required

Trino connectors require specific permissions to access and interact with various data sources. These permissions ensure that Trino can read, write, and manage data as needed. The exact permissions depend on the data source type and the operations that Trino needs to perform.

File-Based Data Sources: For connectors accessing file-based data sources like HDFS, S3, or Azure Blob Storage, Trino needs permissions to list, read, and write files. This typically includes permissions like s3:ListBucket and s3:GetObject for S3, or equivalent permissions for other storage services.
Database Connectors: For relational databases such as MySQL, PostgreSQL, or SQL Server, Trino requires permissions to execute SQL queries. This includes SELECT, INSERT, UPDATE, and DELETE permissions on the relevant tables and schemas.
NoSQL and Other Data Stores: For NoSQL databases like Cassandra or MongoDB, Trino needs permissions to read and write data. This usually involves permissions to query collections or tables and manage indexes.
Cloud Services: Trino needs appropriate API access permissions when accessing cloud services like Google BigQuery or AWS Athena. This includes roles or policies that allow data querying and management.
Snowflake: For Snowflake, Trino requires permissions to execute SQL queries and manage data. This includes USAGE on the database and schema and SELECT on the tables and views.
Apache Iceberg: For Apache Iceberg tables, Trino requires permissions to access the underlying storage system (e.g., S3, HDFS). This includes permissions to list, read, and write files in the storage locations.

Note: If you plan to perform any ETL/ELT operations with Iceberg tables, Trino will need write permissions to the storage system in addition to read permissions.

For detailed information on the specific permissions required for each connector, refer to the official Trino documentation:

PreviousConnecting to HTTPS NextBigQuery

Last updated 4 months ago

Was this helpful?