Scripting Galaxy using the API and BioBlend

Author(s)	Nicola Soranzo
Editor(s)	Clare Sloggett Nitesh Turaga Helena Rasche
Reviewers

Overview
Questions:

What is a REST API?

How to interact with Galaxy programmatically?

Why and when should I use BioBlend?

Objectives:

Interact with Galaxy via BioBlend.

Requirements:

Time estimation: 2 hours

Level: Introductory Introductory

Supporting Materials:

Slides

Jupyter Notebook

FAQs

Published: Jun 23, 2022

Last modification: Nov 16, 2023

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00111

rating Rating: 2.0 (0 recent ratings, 2 all time)

version Revision: 7

Best viewed in a Jupyter Notebook

This tutorial is best viewed in a Jupyter notebook! You can load this notebook one of the following ways

Launching the notebook in Jupyter in Galaxy

Instructions to Launch JupyterLab

Open a Terminal in JupyterLab with File -> New -> Terminal

Run wget https://training.galaxyproject.org/training-material/topics/dev/tutorials/bioblend-api/dev-bioblend-api.ipynb

Select the notebook that appears in the list of files on the left.

Downloading the notebook

Right click one of these links: Jupyter Notebook (With Solutions), Jupyter Notebook (Without Solutions)

Save Link As..

BioBlend (Sloggett et al. 2013) is a Python library to enable simple interaction with Galaxy (Afgan et al. 2018) via the command line or scripts.

Agenda

In this tutorial, we will cover:

Interacting with histories in Galaxy API

Exercise: Galaxy API

Interacting with histories in BioBlend

Exercise: BioBlend

Interacting with histories in BioBlend.objects

Exercise: BioBlend.objects

Optional Extra Exercises

Interacting with histories in Galaxy API

We are going to use the requests Python library to communicate via HTTP with the Galaxy server. To start, let’s define the connection parameters.

You need to insert the API key for your Galaxy server in the cell below:

Open the Galaxy server in another browser tab
Click on “User” on the top menu, then “Preferences”
Click on “Manage API key”
Generate an API key if needed, then copy the alphanumeric string and paste it as the value of the api_key variable below.

import json
from pprint import pprint
from urllib.parse import urljoin

import requests

server = 'https://usegalaxy.eu/'
api_key = ''
base_url = urljoin(server, 'api')
base_url

We now make a GET request to retrieve all histories owned by a user:

headers = {"Content-Type": "application/json", "x-api-key": api_key}
r = requests.get(base_url + "/histories", headers=headers)
print(r.text)
hists = r.json()
pprint(hists)

As you can see, GET requests in Galaxy API return JSON strings, which need to be deserialized into Python data structures. In particular, GETting a resource collection returns a list of dictionaries.

Each dictionary returned when GETting a resource collection gives basic info about a resource, e.g. for a history you have:

id: the unique identifier of the history, needed for all specific requests about this resource
name: the name of this history as given by the user
deleted: whether the history has been deleted.

There is no readily-available filtering capability, but it’s not difficult to filter histories by name:

pprint([_ for _ in hists if _['name'] == 'Unnamed history'])

If you are interested in more details about a given resource, you just need to append its id to the previous collection request, e.g. to the get more info for a history:

hist0_id = hists[0]['id']
print(hist0_id)
r = requests.get(base_url + "/histories/" + hist0_id, headers=headers)
pprint(r.json())

As you can see, there are much more entries in the returned dictionary, e.g.:

create_time
size: total disk space used by the history
state_ids: ids of history datasets for each possible state.

To get the list of datasets contained in a history, simply append /contents to the previous resource request.

r = requests.get(base_url + "/histories/" + hist0_id + "/contents", headers=headers)
hdas = r.json()
pprint(hdas)

The dictionaries returned when GETting the history content give basic info about each dataset, e.g.: id, name, deleted, state, url…

To get the details about a specific dataset, you can use the datasets controller:

hda0_id = hdas[0]['id']
print(hda0_id)
r = requests.get(base_url + "/datasets/" + hda0_id, headers=headers)
pprint(r.json())

Some of the interesting additional dictionary entries are:

create_time
creating job: id of the job which created this dataset
download_url: URL to download the dataset
file_ext: the Galaxy data type of this dataset
file_size
genome_build: the genome build (dbkey) associated to this dataset.

New resources are created with POST requests. The uploaded data needs to be serialized in a JSON string. For example, to create a new history:

data = {'name': 'New history'}
r = requests.post(base_url + "/histories", data=json.dumps(data), headers=headers)
new_hist = r.json()
pprint(new_hist)

The return value of a POST request is a dictionary with detailed info about the created resource.

To update a resource, make a PUT request, e.g. to change the history name:

data = {'name': 'Updated history'}
r = requests.put(base_url + "/histories/" + new_hist["id"], json.dumps(data), headers=headers)
print(r.status_code)
pprint(r.json())

The return value of a PUT request is usually a dictionary with detailed info about the updated resource.

Finally to delete a resource, make a DELETE request, e.g.:

r = requests.delete(base_url + "/histories/" + new_hist["id"], headers=headers)
print(r.status_code)

Exercise: Galaxy API

Goal: Upload a file to a new history, import a workflow and run it on the uploaded dataset.

Question: Initialise

First, define the connection parameters. What variables do you need?

import json
from pprint import pprint
from urllib.parse import urljoin

import requests

server = 'https://usegalaxy.eu/'
api_key = ''
base_url = urljoin(server, 'api')

# Try it out here!

Question: New History

Next, create a new Galaxy history via POST to the correct API.

headers = {"Content-Type": "application/json", "x-api-key": api_key}
data = {"name": "New history"}
r = requests.post(base_url + "/histories", data=json.dumps(data), headers=headers)
new_hist = r.json()
pprint(new_hist)

# Try it out here!

Question: Upload a dataset

Upload the local file 1.txt to the new history. You need to run the special upload1 tool by making a POST request to /api/tools. You don’t need to pass any inputs to it apart from attaching the file as files_0|file_data. Also, note that when attaching a file the payload should not be serialized to a JSON string and you need to drop Content-Type from the request headers.

You can obtain the 1.txt file from the following URL, you’ll need to download it first.
https://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/1.txt
data = {
    "history_id": new_hist["id"],
    "tool_id": "upload1"
}
with open("1.txt", "rb") as f:
    files = {"files_0|file_data": f}
    r = requests.post(base_url + "/tools", data=data, files=files, headers={"x-api-key": api_key})
ret = r.json()
pprint(ret)

# Try it out here!

Question: Find the dataset in your history

Find the new uploaded dataset, either from the dict returned by the POST request above or from the history contents.
hda = ret['outputs'][0]
pprint(hda)

# Try it out here!

Question: Import a workflow

Import a workflow from the local file convert_to_tab.ga by making a POST request to /api/workflows. The only needed data is workflow, which must be a deserialized JSON representation of the workflow .ga file.

You can obtain the convert_to_tab.ga file from the following URL, you’ll need to download it first.
https://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/convert_to_tab.ga
with open("convert_to_tab.ga", "r") as f:
    workflow_dict = json.load(f)
data = {"workflow": workflow_dict}
r = requests.post(base_url + "/workflows", data=json.dumps(data), headers=headers)
wf = r.json()
pprint(wf)

# Try it out here!

Question: View the workflow details

View the details of the imported workflow by making a GET request to /api/workflows.
r = requests.get(base_url + "/workflows/" + wf["id"], headers=headers)
wf = r.json()
pprint(wf)

# Try it out here!

Question: Invoke the workflow

Run the imported workflow on the uploaded dataset inside the same history by making a POST request to /api/workflows/WORKFLOW_ID/invocations. The only needed data are history and inputs.
inputs = {0: {'id': hda['id'], 'src': 'hda'}}
data = {
    'history': 'hist_id=' + new_hist['id'],
    'inputs': inputs}
r = requests.post(base_url + "/workflows/" + wf["id"] + "/invocations", data=json.dumps(data), headers=headers)
pprint(r.json())

# Try it out here!

Question: View the results

View the results on the Galaxy server with your web browser. Were you successful? Did it run?

Interacting with histories in BioBlend

If you need to install BioBlend into your Jupyter environment, you can execute:

python import sys !{sys.executable} -m pip install bioblend

You need to insert the API key for your Galaxy server in the cell below:

Open the Galaxy server in another browser tab
Click on “User” on the top menu, then “Preferences”
Click on “Manage API key”
Generate an API key if needed, then copy the alphanumeric string and paste it as the value of the api_key variable below.

The user interacts with a Galaxy server through a GalaxyInstance object:

from pprint import pprint

import bioblend.galaxy

server = 'https://usegalaxy.eu/'
api_key = ''
gi = bioblend.galaxy.GalaxyInstance(url=server, key=api_key)

The GalaxyInstance object gives you access to the various controllers, i.e. the resources you are dealing with, like histories, tools and workflows. Therefore, method calls will have the format gi.controller.method(). For example, the call to retrieve all histories owned by the current user is:

pprint(gi.histories.get_histories())

As you can see, methods in BioBlend do not return JSON strings, but deserialize them into Python data structures. In particular, get_ methods return a list of dictionaries.

Each dictionary gives basic info about a resource, e.g. for a history you have:

id: the unique identifier of the history, needed for all specific requests about this resource
name: the name of this history as given by the user
deleted: whether the history has been deleted.

New resources are created with create_ methods, e.g. the call to create a new history is:

new_hist = gi.histories.create_history(name='BioBlend test')
pprint(new_hist)

As you can see, to make POST requests in BioBlend it is not necessary to serialize data, you just pass them explicitly as parameters. The return value is a dictionary with detailed info about the created resource.

get_ methods usually have filtering capabilities, e.g. it is possible to filter histories by name:

pprint(gi.histories.get_histories(name='BioBlend test'))

To upload the local file 1.txt to the new history, you can run the special upload tool by calling the upload_file method of the tools controller.

You can obtain the 1.txt file from the following URL, you’ll need to download it first.

https://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/1.txt

hist_id = new_hist["id"]
pprint(gi.tools.upload_file("1.txt", hist_id))

If you are interested in more details about a given resource for which you know the id, you can use the corresponding show_ method. For example, to the get more info for the history we have just populated:

pprint(gi.histories.show_history(history_id=hist_id))

As you can see, there are much more entries in the returned dictionary, e.g.:

create_time
size: total disk space used by the history
state_ids: ids of history datasets for each possible state.

To get the list of datasets contained in a history, simply add contents=True to the previous call.

hdas = gi.histories.show_history(history_id=hist_id, contents=True)
pprint(hdas)

The dictionaries returned when showing the history content give basic info about each dataset, e.g.: id, name, deleted, state, url…

To get the details about a specific dataset, you can use the datasets controller:

hda0_id = hdas[0]['id']
print(hda0_id)
pprint(gi.datasets.show_dataset(hda0_id))

Some of the interesting additional dictionary entries are:

create_time
creating job: id of the job which created this dataset
download_url: URL to download the dataset
file_ext: the Galaxy data type of this dataset
file_size
genome_build: the genome build (dbkey) associated to this dataset.

To update a resource, use the update_ method, e.g. to change the name of the new history:

pprint(gi.histories.update_history(new_hist['id'], name='Updated history'))

The return value of update_ methods is usually a dictionary with detailed info about the updated resource.

Finally to delete a resource, use the delete_ method, e.g.:

pprint(gi.histories.delete_history(new_hist['id']))

Exercise: BioBlend

Goal: Upload a file to a new history, import a workflow and run it on the uploaded dataset.

Question: Initialise

Create a GalaxyInstance object.

from pprint import pprint

import bioblend.galaxy

server = 'https://usegalaxy.eu/'
api_key = ''
gi = bioblend.galaxy.GalaxyInstance(url=server, key=api_key)

# Try it out here!

Question: New History

Create a new Galaxy history.
new_hist = gi.histories.create_history(name='New history')
pprint(new_hist)

# Try it out here!

Question: Upload a dataset

Upload the local file 1.txt to the new history using tools.upload_file() .

You can obtain the 1.txt file from the following URL, you’ll need to download it first.
https://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/1.txt
ret = gi.tools.upload_file("1.txt", new_hist["id"])
pprint(ret)

# Try it out here!

Question: Find the dataset in your history

Find the new uploaded dataset, either from the dict returned by tools.upload_file() or from the history contents.
hda = ret['outputs'][0]
pprint(hda)

# Try it out here!

Question: Import a workflow

Import a workflow from the local file convert_to_tab.ga using workflows.import_workflow_from_local_path() .

You can obtain the convert_to_tab.ga file from the following URL, you’ll need to download it first.
https://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/convert_to_tab.ga
wf = gi.workflows.import_workflow_from_local_path("convert_to_tab.ga")
pprint(wf)

# Try it out here!

Question: View the workflow details

View the details of the imported workflow using workflows.show_workflow()
wf = gi.workflows.show_workflow(wf['id'])
pprint(wf)

# Try it out here!

Question: Invoke the workflow

Run the imported workflow on the uploaded dataset inside the same history using workflows.invoke_workflow() .
inputs = {0: {'id': hda['id'], 'src': 'hda'}}
ret = gi.workflows.invoke_workflow(wf['id'], inputs=inputs, history_id=new_hist['id'])
pprint(ret)

# Try it out here!

Question: View the results

View the results on the Galaxy server with your web browser. Were you successful? Did it run?

Interacting with histories in BioBlend.objects

You need to insert the API key for your Galaxy server in the cell below:

Open the Galaxy server in another browser tab
Click on “User” on the top menu, then “Preferences”
Click on “Manage API key”
Generate an API key if needed, then copy the alphanumeric string and paste it as the value of the api_key variable below.

The user interacts with a Galaxy server through a GalaxyInstance object:

from pprint import pprint

import bioblend.galaxy.objects

server = 'https://usegalaxy.eu/'
api_key = ''
gi = bioblend.galaxy.objects.GalaxyInstance(url=server, api_key=api_key)

All GalaxyInstance method calls have the client.method() format, where client is the name of the resources you dealing with. There are 2 methods to get the list of resources:

get_previews(): lightweight (one GET request), retrieves basic resources’ info, returns a list of preview objects
list(): one GET request for each resource, retrieves full resources’ info, returns a list of full objects.

For example, the call to retrieve previews of all histories owned by the current user is:

pprint(gi.histories.get_previews())

New resources are created with create() methods, e.g. to create a new history:

new_hist = gi.histories.create(name='BioBlend test')
pprint(new_hist)

As you can see, the create() methods in BioBlend.objects returns an object, not a dictionary.

Both get_previews() and list() methods usually have filtering capabilities, e.g. it is possible to filter histories by name:

pprint(gi.histories.list(name='BioBlend test'))

To upload the local file 1.txt to the new history, you can run the special upload tool by calling the upload_file method of the History object.

You can obtain the 1.txt file from the following URL, you’ll need to download it first.

https://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/1.txt

hda = new_hist.upload_file("1.txt")
pprint(hda)

Please note that with BioBlend.objects there is no need to find the upload dataset, since upload_file() already returns a HistoryDatasetAssociation object.

Both HistoryPreview and History objects have many of their properties available as attributes, e.g. the id.

If you need to specify the unique id of the resource to retrieve, you can use the get() method, e.g. to get back the history we created before:

gi.histories.get(new_hist.id)

To get the list of datasets contained in a history, simply look at the content_infos attribute of the History object.

pprint(new_hist.content_infos)

To get the details about one dataset, you can use the get_dataset() method of the History object:

new_hist.get_dataset(hda.id)

You can also filter history datasets by name using the get_datasets() method of History objects.

To update a resource, use the update() method of its object, e.g. to change the history name:

new_hist.update(name='Updated history')

The return value of update() methods is the updated object.

Finally to delete a resource, you can use the delete() method of the object, e.g.:

new_hist.delete()

Exercise: BioBlend.objects

Goal: Upload a file to a new history, import a workflow and run it on the uploaded dataset.

Question: Initialise

Create a GalaxyInstance object.

from pprint import pprint

import bioblend.galaxy

server = 'https://usegalaxy.eu/'
api_key = ''
gi = bioblend.galaxy.objects.GalaxyInstance(url=server, api_key=api_key)

# Try it out here!

Question: New History

Create a new Galaxy history.
new_hist = gi.histories.create(name='New history')
pprint(new_hist)

# Try it out here!

Question: Upload a dataset

Upload the local file 1.txt to the new history using the upload_file() method of History objects.

You can obtain the 1.txt file from the following URL, you’ll need to download it first.
https://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/1.txt
hda = new_hist.upload_file("1.txt")
pprint(hda)

# Try it out here!

Question: Import a workflow

Import a workflow from the local file convert_to_tab.ga using workflows.import_new()

You can obtain the convert_to_tab.ga file from the following URL, you’ll need to download it first.
https://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/convert_to_tab.ga
with open("convert_to_tab.ga", "r") as f:
    wf_string = f.read()
wf = gi.workflows.import_new(wf_string)
pprint(wf)

# Try it out here!

Question: View the workflow inputs
pprint(wf.inputs)

# Try it out here!

Question: Invoke the workflow

Run the imported workflow on the uploaded dataset inside the same history using the invoke() method of Workflow objects.
inputs = {'0': hda}
wf.invoke(inputs=inputs, history=new_hist)

# Try it out here!

Question: View the results

View the results on the Galaxy server with your web browser. Were you successful? Did it run?

Optional Extra Exercises

If you have completed the exercise, you can try to perform these extra tasks with the help of the online documentation:

Download the workflow result to your computer
Publish your history

Comment: Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

Step 1

ansible-galaxy

Step 2

backup-cleanup

Step 3

customization

Step 4

tus

Step 5

cvmfs

Step 6

apptainer

Step 7

tool-management

Step 8

reference-genomes

Step 9

data-library

Step 10

dev/bioblend-api

Step 11

connect-to-compute-cluster

Step 12

job-destinations

Step 13

pulsar

Step 14

celery

Step 15

gxadmin

Step 16

reports

Step 17

monitoring

Step 18

tiaas

Step 19

sentry

Step 20

ftp

Step 21

beacon

You've Finished the Tutorial

Key points

The API allows you to use Galaxy’s capabilities programmatically.

BioBlend makes using the Galaxy API from Python easier.

BioBlend objects is an object-oriented interface for interacting with Galaxy.

Frequently Asked Questions

Have questions about this tutorial? Have a look at the available FAQ pages and support channels

References

Sloggett, C., N. Goonasekera, and E. Afgan, 2013 BioBlend: automating pipeline analyses within Galaxy and CloudMan. Bioinformatics 29: 1685–1686. 10.1093/bioinformatics/btt199
Afgan, E., D. Baker, B. Batut, M. van den Beek, D. Bouvier et al., 2018 The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Research 46: W537–W544. 10.1093/nar/gky379

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Nicola Soranzo, Scripting Galaxy using the API and BioBlend (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/dev/tutorials/bioblend-api/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{dev-bioblend-api,
author = "Nicola Soranzo",
	title = "Scripting Galaxy using the API and BioBlend (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/dev/tutorials/bioblend-api/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Congratulations on successfully completing this tutorial!

Do you want to extend your knowledge?
Follow one of our recommended follow-up trainings:

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.
shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl https://training.galaxyproject.org/training-material/api/topics/dev/tutorials/bioblend-api/tutorial.json | jq .admin_install_yaml -r)
Alternatively you can copy and paste the following YAML
---
install_tool_dependencies: true
install_repository_dependencies: true
install_resolver_dependencies: true
tools: []

3 stars 1

1 stars 1

April 2023

3 stars: Disliked: - (Maybe just me) Too much content, too fast. - Account on a production Galaxy instance is required. I did not have one and it took me a while to get set up - Some familiarity with Jupyter Notebooks seems to be another prerequisite