LogoLogo
  • Overview
  • publisher
    • Introduction
    • Getting Started
      • Logging in to Publisher
    • Data Sources
      • Connecting a Data Source
      • Managing a Data Source
      • Connectors
        • AWS S3 Permissions
        • Connecting to AWS S3 Storage
        • Google Cloud Storage (GCS) Permissions
        • Connecting to Google Cloud Storage
        • PostgreSQL Permissions
        • Connecting to PostgreSQL
        • PostgreSQL on Azure Permissions
        • Microsoft Azure Blob Storage Permissions
        • Connecting to Microsoft Azure Blob Storage
        • Connecting to HTTPS
        • Connecting to other sources via Trino
          • BigQuery
    • Collections
      • Creating a Collection
      • Sharing a Collection
      • Collection Filters
      • Editing Collection Metadata
      • Updating Collection Contents
    • Access Policies
      • Creating an Access Policy
      • Managing Access Policies
    • Questions
      • Adding Questions
      • Example Question
    • Settings
      • Viewing Current and Past Administrators
      • Adding an Administrator
      • Removing an Administrator
      • Setting Notification Preferences
  • Explorer
    • Introduction
    • Viewing a Collection
    • Browsing Collections
    • Asking Questions
    • Accessing a Private Collection
      • Requesting Access to a Private Collection
    • Filtering Data in Tables
      • Strings
      • Dates
      • Numbers
  • Workbench
    • Introduction
    • Getting Started
      • Logging into Workbench
      • Connecting an Engine
      • Finding or Importing a Workflow
      • Configuring Workflow Inputs
      • Running and Monitoring a Workflow
      • Locating Outputs
    • Engines
      • Adding and Updating an Engine
        • On AWS HealthOmics
        • On Microsoft Azure
        • On Google Cloud Platform
        • On Premises
      • Parameters
        • AWS HealthOmics
        • Google Cloud Platform
        • Microsoft Azure
        • On-Premises
        • Cromwell
        • Amazon Genomics CLI
    • Workflows
      • Finding Workflows
      • Adding a Workflow
      • Supported Languages
      • Repositories
        • Dockstore
    • Instruments
      • Getting Started with Instruments
      • Connecting a Storage Account
      • Using Sample Data in a Workflow
      • Running Workflows Using Samples
      • Family Based Analysis with Pedigree Information
      • Monitor the Workflow
      • CLI Reference
        • Instruments
        • Storage
        • Samples
        • OpenAPI Specification
    • Entities
    • Terminology
  • Passport
    • Introduction
    • Registering an Email Address for a Google Identity
  • Command Line Interface
    • Installation
    • Usage Examples
    • Working with JSON Data
    • Reference
      • workbench
        • runs submit
        • runs list
        • runs describe
        • runs cancel
        • runs delete
        • runs logs
        • runs tasks list
        • runs events list
        • engines list
        • engines describe
        • engines parameters list
        • engines parameters describe
        • engines health-checks list
        • workflows create
        • workflows list
        • workflows describe
        • workflows update
        • workflows delete
        • workflows versions create
        • workflows versions list
        • workflows versions describe
        • workflows versions files
        • workflows versions update
        • workflows versions delete
        • workflows versions defaults create
        • workflows versions defaults list
        • workflows versions defaults describe
        • workflows versions defaults update
        • workflows versions defaults delete
        • namespaces get-default
        • storage add
        • storage delete
        • storage describe
        • storage list
        • storage update
        • storage platforms add
        • storage platforms delete
        • storage platforms describe
        • storage platforms list
        • samples list
        • samples describe
        • samples files list
      • publisher
        • datasources list
  • Analysis
    • Python Library
    • Popular Environments
      • Cromwell
      • CWL Tool
      • Terra
      • Nextflow
      • DNAnexus
Powered by GitBook

© DNAstack. All rights reserved.

On this page
  • Configuration Process
  • Initial Setup
  • Catalog Properties
  • Row count query
  • Permissions required

Was this helpful?

  1. publisher
  2. Data Sources
  3. Connectors

Connecting to other sources via Trino

PreviousConnecting to HTTPSNextBigQuery

Last updated 21 days ago

Was this helpful?

Publisher can connect to additional data sources via Trino. This guide explains how to configure Trino connectors in Publisher.

Configuration Process

Initial Setup

From the Publisher interface, select "Data Sources" in the navigation bar and click "Connect Data Source". Choose "Trino" from the available connectors.

Catalog Properties

In Trino, the catalog properties define how Trino connects to your data source. A catalog contains schemas and references a data source through a connector, forming the foundation of your data access configuration.

When you configure a Trino data source, you need to specify appropriate catalog properties based on the type of data source you're connecting to. These properties typically include:

  • Connection details (hostname, port)

  • Authentication credentials

  • Schema information

  • Performance settings

  • Security configurations

Configuration Instructions

  • For the connector types below, refer to the relevant instructions page:

    • BigQuery

  • For all other connector types, refer to the documentation for your specific connector.

When you connect tables via Trino in Publisher, they initially appear in your Library collection. From there, you can add these tables to one or more Collections. To access tables in these Collections, use the fully-qualified name format: collections.{collection_slug_name}.{table_name}

Row count query

The Row count query field allows you to specify a SQL query that counts the number of rows in a table. You can use template placeholders in your query, which will be automatically replaced with the appropriate values when the query is executed. Placeholders are surrounded by double braces ({{ and }}).

Available placeholders:

  • {{catalog}} - The catalog name

  • {{schema}} - The schema name

  • {{table}} - The table name

Example Queries

You can use placeholders as identifiers:

SELECT COUNT(*) AS row_count FROM "{{catalog}}"."{{schema}}"."{{table}}"

Or as string literals in more complex queries:

SELECT row_count FROM "{{catalog}}".INFORMATION_SCHEMA WHERE table_schema='{{schema}}' AND table_name='{{table}}'

The Row count query will be executed frequently to provide up-to-date row counts for tables in your data source. As the data source owner, you are responsible for monitoring the cost and performance impact of the query.

While the standard COUNT(*) query works for most data sources, it is recommended to use a more efficient query to retrieve row counts when available, as this can significantly reduce resource usage and costs.

Permissions required

Trino connectors require specific permissions to access and interact with various data sources. These permissions ensure that Trino can read, write, and manage data as needed. The exact permissions depend on the data source type and the operations that Trino needs to perform.

  • File-Based Data Sources: For connectors accessing file-based data sources like HDFS, S3, or Azure Blob Storage, Trino needs permissions to list, read, and write files. This typically includes permissions like s3:ListBucket and s3:GetObject for S3, or equivalent permissions for other storage services.

  • Database Connectors: For relational databases such as MySQL, PostgreSQL, or SQL Server, Trino requires permissions to execute SQL queries. This includes SELECT, INSERT, UPDATE, and DELETE permissions on the relevant tables and schemas.

  • NoSQL and Other Data Stores: For NoSQL databases like Cassandra or MongoDB, Trino needs permissions to read and write data. This usually involves permissions to query collections or tables and manage indexes.

  • Cloud Services: Trino needs appropriate API access permissions when accessing cloud services like Google BigQuery or AWS Athena. This includes roles or policies that allow data querying and management.

  • Snowflake: For Snowflake, Trino requires permissions to execute SQL queries and manage data. This includes USAGE on the database and schema and SELECT on the tables and views.

  • Apache Iceberg: For Apache Iceberg tables, Trino requires permissions to access the underlying storage system (e.g., S3, HDFS). This includes permissions to list, read, and write files in the storage locations.

Note: If you plan to perform any ETL/ELT operations with Iceberg tables, Trino will need write permissions to the storage system in addition to read permissions.

For detailed information on the specific permissions required for each connector, refer to the official Trino documentation:

Trino Connectors
Trino Documentation
Trino Connectors