# Python Library

DNAstack provides a Python client library called `dnastack-client-library 3.1`. This can be used to interact with DNAstack using Python scripts and Jupyter Notebook.

## Prerequisite

* Python 3.11 or newer
* `pip` 21.3 or newer

Only for **Windows**

* PowerShell

## Installation

See [here](/products/command-line-interface/installation.md#installing-the-clilibrary-package) for instructions on installing the library

## Usage

{% hint style="info" %}
Before we start...

Generally, some functions and methods automatically trigger the authorization process if required. However, they may allow anonymous access by simply setting the `no_auth` argument to `True`.
{% endhint %}

### Set up a client factory with Explorer or a service registry

To get started, we will get the endpoints from the service registry by just specifying the **hostname** of the service with GA4GH Service Registry API.

In this example, we will set up a client factory with Viral AI (Explorer) with the `use` function.

```python
from dnastack import use

factory = use('viral.ai')
```

{% hint style="info" %}
The `use` method allows anonymous access by setting the `no_auth` argument to `True`. For example:

```
factory = use('viral.ai', no_auth=True)
```

{% endhint %}

The `factory` has two methods:

* `factory.all()` will give you the list of `dnastack.ServiceEndpoint` objects,
* `factory.get(id: str)` is to instantiate a service client for the requested endpoint.

The `factory.get` method relies on the type property of the `ServiceEndpoint` object to determine which client class to use. Here is an example on how it does that.

It will instantiate a `dnastack.CollectionServiceClient` for:

* `com.dnastack:collection-service:1.0.0`
* `com.dnastack.explorer:collection-service:1.1.0`

It will instantiate a `dnastack.DataConnectClient` for:

* `org.ga4gh:data-connect:1.0.0`

It will instantiate a `dnastack.DrsClient` for:

* `org.ga4gh:drs:1.1.0`

### Interact with Collection Service API

Now that we get the information of the factory from the service registry, we can create a client to the collection service.

```python
from dnastack import CollectionServiceClient
collection_service_client = factory.get_one_of(client_class=CollectionServiceClient)
```

And this is how to list all available collections.

```python
import json

collections = collection_service_client.list_collections()

print(json.dumps(
    [
        {
            'id': c.id,
            'slugName': c.slugName,
            'itemsQuery': c.itemsQuery,
        }
        for c in collections
    ],
    indent=2
))
```

where `slugName` is the alternative ID of a collection and `itemsQuery` is the SQL query of items in the collection.

{% hint style="info" %}
The `list_collections` method allows anonymous access by setting the `no_auth` argument to `True`.
{% endhint %}

### Set up a client for Data Connect Service

In this section, we switch to use a Data Connect client.

Suppose that you know which collection you want to work with. Then, use `factory` to get the Data Connect client for the corresponding collection where the service ID is `data-connect-<collection.slugName>`.

```python
from dnastack import DataConnectClient

data_connect_client: DataConnectClient = factory.get('data-connect-<collection.slugName>')
```

For example, if the collection is `ncbi-sra`, it will look like this.

```python
data_connect_client: DataConnectClient = factory.get('data-connect-ncbi-sra')
```

where `data-connect-ncbi-sra` is the service ID of the Data Connect service that is corresponding to the collection.

### List all accessible tables

Before we can run a query, we need to get the list of available tables (`dnastack.client.data_connect.TableInfo` objects).

```python
ables = data_connect_client.list_tables()

print(json.dumps(
    [
        dict(
            name=table.name
        )
        for table in tables
    ],
    indent=2
))
```

where the `name` property of each item (`TableInfo` object) in `tables` is the name of the table that we can use in the query.

{% hint style="info" %}
**Note**

Depending on the implementation of the `/tables` endpoint, the `TableInfo` object in the list may be incomplete, for example, the data model (`data_model`) may only contain the reference URL, instead of an object schema. To get the more complete information, please use `Table` which will be mentioned in the next section.

The `list_tables` method allows anonymous access by setting the `no_auth` argument to `True`.
{% endhint %}

### Get the table information and data

To get started, we need to use the table method, which returns a table wrapper object (`dnastack.client.data_connect.Table`). In this example, we use the first table available.

```python
table = data_connect_client.table(tables[0])
```

The table method also takes a string where it assumes that the given string is the name of the table, e.g.,

```python
table = data_connect_client.table(tables[0].name)
```

or

```python
table = data_connect_client.table('cat.sch.tbl')
```

A `Table` object also has the name property, which is the table name (same as `Table.name`). However, it provides two properties:

* The `info` property provides the more complete table information as a `TableInfo` object,
* The `data` property provides an iterator to the actual table data.

{% hint style="info" %}
The `table` method allows anonymous access by setting the `no_auth` argument to `True`.
{% endhint %}

### Integrate a Table object with pandas.DataFrame

You can easily instantiate a `pandas.DataFrame` object like shown below:

```python
import pandas

csv_df = pandas.DataFrame(table.data)
```

where table is a Table object.

### Query data

Now, let’s say we will select up to 10 rows from the first table.

```python
result_iterator = data_connect_client.query(f'SELECT * FROM {table.name} LIMIT 10')
```

The `query` method will return an iterator to the result where each item in the result is a string-to-anything dictionary.

{% hint style="info" %}
The `query` method allows anonymous access by setting the `no_auth` argument to `True`.
{% endhint %}

**Integrate the query result (iterator) with pandas.DataFrame**

You can easily instantiate a `pandas.DataFrame` object like shown below:

```python
import pandas

csv_df = pandas.DataFrame(result_iterator)
```

### Download blobs with DRS API

To download a blob, you need to find out the blobs that you have access to from a collection. To get the list of available blob items, you have to run the items query with a data connect client.

In this example, suppose that the first collection has blobs. We would like to get the first 20 blobs.

```python
blob_collection = [c for c in collections if c.slugName == 'ncbi-sra'][0]
items = [i
         for i in data_connect_client.query(blob_collection.itemsQuery + ' LIMIT 20')
         if i['type'] == 'blob']
```

{% hint style="info" %}
**Tip**

The items query may contain both “table” and “blob” items. You may want to filter them.
{% endhint %}

Here is how to get a blob object.

```python
from dnastack import DrsClient

drs_client: DrsClient = factory.get("drs")
blob = drs_client.get_blob(items[0]['id'])
```

{% hint style="info" %}
**Tip**

If you have external DRS URL, you can use it to by setting the url parameter instead of id. For example,

```python
blob = drs_client.get_blob('drs://viral.ai/fmyfkmy1230-3rhbfa8weyf')
```

If the endpoint is publicly accessible, you can set no\_auth to True to ensure that the client will never initate the authentication procedure.

```python
blob = drs_client.get_blob(..., no_auth=True)
```

{% endhint %}

Here is how to download the blob data.

```python
blob.data
```

where the `data` property returns a byte array.

**Integrate Blob objects with pandas.DataFrame**

You can easily instantiate a pandas.DataFrame object like shown below:

```python
import pandas

csv_df = pandas.read_csv(blob.get_download_url())
```

where blob.get\_download\_url() returns the access URL.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.omics.ai/products/analysis/python-library.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
