+ - 0:00:00
Notes for current slide

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other.

Useful when presenting.

Notes for next slide



Tool development and integration into Galaxy



last_modification Updated:   purlPURL: gxy.io/GTN:S00051

text-document Plain-text slides |

Tip: press P to view the presenter notes | arrow-keys Use arrow keys to move between slides
1 / 91

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other.

Useful when presenting.

question Questions

  • What is a tool for Galaxy?

  • How to write a best-practice tool?

  • How to deal with the tool environment?

2 / 91

objectives Objectives

  • Learn what a tool is and its structure

  • Use the Planemo utilities to develop a tool

  • Deal with the dependencies

  • Write functional tests

  • Make a tool ready for publishing in a ToolShed

3 / 91

Galaxy tools

4 / 91

Tools in the Galaxy UI

Screenshot of galaxy with the three main panels labelled list of available tools on left, 'wrapper' in center, and history with results as datasets on right

5 / 91

Galaxy tool / wrapper

Screenshot of tool interface in Galaxy for GraPhlAn showing a variety of input types like file selection, select, text, numbers.

graphlan.py --format png --size 7 'input_tree.txt' 'png_image.png'
6 / 91

So what is a tool?

Link between the Galaxy UI and the underlying tool:

  • Description of the user interface
  • How to invoke the tool
  • Which files and options to pass
  • Which files the tool will produce as output
7 / 91

Tool execution

A flowchart is depicated with the galaxy interface pointing to a bowtie2-wrapper.xml file which has a command, inputs, and outputs. Inputs points back to the tool interface. The command block points to the Operating System with an image of servers and the bowtie2 binary. This points back to outputs, and back to the history within the galaxy interface

8 / 91
  1. <inputs> (datasets and parameters) specified in the tool XML are exposed in the Galaxy tool UI
  2. When the user fills the form and click the Execute button, Galaxy fills the <command> template in the XML with the inputs entered by the user and execute the Cheetah code, producing a script as output
  3. Galaxy creates a job for the generated script and executes it somewhere (bowtie2 is run in this case)
  4. Some (not necessarily all) output files become new history datasets, as specified in the <outputs> XML tag set

Tool execution

An XML file as an image. The tool id is on the first line, then a description element, a command block running "echo Hello World $mystring to $output", an inputs section with a mystring text input, and an output1 tabular data file. A help block is shown last.

9 / 91
  • CDATA tags are used to prevent the interpretation of ampersands and less-than signs as XML special characters
  • The tool name and description are combined in the left panel (tool menu), keep them short!

Tool execution

The previous image, but the input parameter named 'mystring' is shown pointing to its place in the command block. Same for the output pointing to the command block. An overlay shows the Job Command Line /bin/echo Hello world you are amazing > a/path.dat

10 / 91

Tool execution

The previous image but now there is an overlay showing the output text Hello world you are amazing

11 / 91

Tool XML

Galaxy tool XML format is formally defined in a XML Schema Definition (XSD), used to generate the corresponding online documentation

12 / 91

XML Editor

You are free to use your prefered code editor to write Galaxy tools.

If you use Visual Studio Code (or Codium), we recommend to install the dedicated extension.

It provides XML validation, tags and attributes completion, help/documentation on hover, and other smart features to assist in following best practices.

An animation showing someone writing a tool XML file, while being assisted by the VSCode extension.

13 / 91

tool

<tool id="graphlan" name="GraPhlAn" version="1.1.3+galaxy2" profile="22.05">
  • id: unique identifier of your tool, should contain only [a-z0-9_-]
  • name: shown to the user, displayed in the tool box
  • version: the version of the wrapped tool, followed by a +galaxyX suffix for wrapper version
  • profile: minimum Galaxy version that should be required to run this tool (IUC recommends not older than 1 year)
14 / 91
  • The top level tool tag defines the tool naming and version
  • The id attribute is the unique identifier of your tool, it should contain only letters, digits, underscores or dashes
  • The name attribute is shown to the user and displayed in the tool box
  • The version attribute contains the version of the wrapped tool, followed by a +galaxyX suffix for wrapper version
  • The profile attribute should be set to the minimum Galaxy version that should be required to run this tool (IUC recommends not older than 1 year)

command

How to invoke the tool?

<requirements>
<requirement type="package" version="1.1.3">graphlan</requirement>
</requirements>
<command><![CDATA[
graphlan.py
--format $format
...
]]></command>

If the script is provided with the tool xml:

<requirements>
<requirement type="package" version="2.7">python</requirement>
</requirements>
<command><![CDATA[
python '$__tool_directory__/graphlan.py'
--format $format
...
]]></command>
15 / 91
  • In the first case, graphlan.py is expected to be on the PATH and executable when the job executes. This is usually accomplished by specifying some <requirement/> tags.
  • In the second case, $__tool_directory__ is a special variable which is substituted by Galaxy with the directory where the tool XML is

inputs > param to command

Parameters are directly linked to variables in <command> by the name or argument attribute

Parameters can be optional or required.

<command><![CDATA[
graphlan.py
...
#if str($dpi):
--dpi $dpi
#end if
'$input_tree'
...
]]></command>
<inputs>
<param name="input_tree" type="data" label="..."/>
<param argument="--dpi" type="integer" optional="true" label="..."
help="For non vectorial formats" />
</inputs>
  • The #if ... #end if syntax comes from the Cheetah template language, which has a Python-like syntax
16 / 91
  • The name or argument attribute identifies a parameter (details of argument later).
  • Parameters have different types (data, data_collection, integer, float, text, select, boolean, color, data_column,...) and can be optional.

inputs > param > data

Screenshot of the file selection input in Galaxy permitting selection of a single file, multiple files, or a collection. In bold text a label appears above the component describing its use. Below the component in light grey is a help message. This applies to every input in Galaxy

<param name="..." type="data" format="txt" label="..." help="..." />

List of possible formats

17 / 91

inputs > param > integer | float

Screenshot of an integer input, it's just an input field set to 7.

<param name="..." type="integer" value="7" label="..." help="..."/>

Screenshot of a float input from the cd_hit tool. This time also a slider appears due to addition of min and max

<param name="..." type="float" min="0.4" max="1.0" value="0.9" label="..." help="..."/>
18 / 91
  • In the first case, Galaxy creates a text box which accepts only integer values
  • In the second case, since both min and max are specified, a slider is shown in addition

inputs > param > text

Screenshot of a textbox from the cd_hit tool.

<param name="..." type="text" value="..." label="..." help="..."/>
19 / 91

inputs > param > select

Screenshot of a select drop down with several image formats from the bedtools_bedtoigv tool.

<param name="..." type="select" label="...">
<option value="png" selected="true">PNG</option>
<option value="eps">EPS</option>
<option value="svg">SVG</option>
</param>

If no option has selected="true", the first one is selected by default.

20 / 91

inputs > param > select

The select is now a set of checkboxes.

<param name="..." type="select" display="radio" label="..." help="...">
<option value="min" selected="true">Minimum</option>
<option value="mean">Mean</option>
<option value="max">Max</option>
<option value="sum">Sum</option>
</param>
21 / 91

inputs > param > select

A select/unselect all checkbox appears before a box with numerous selections inside, appearing as badges that can be added or removed.

<param name="..." type="select" multiple="true" label="..." help="...">
<option value="ld" selected="true">Length distribution</option>
<option value="gc" selected="true">GC content distribution</option>
...
</param>
22 / 91

inputs > param > boolean

A yes/no switch from the bg_sortmerna tool.

<param name="..." type="boolean" checked="false" truevalue="--log" falsevalue=""
label="..." help="..." />
23 / 91

inputs > param > conditional

Screenshot of a conditional select box set to Paired, and two file inputs appear below it. Screenshot of a conditional select box set to Single, and only a single file input appears below it.


<command><![CDATA[
#if $fastq_input.selector == 'paired':
'$fastq_input.input1' '$fastq_input.input2'
#else:
'$fastq_input.input'
#end if
]]></command>
<inputs>
<conditional name="fastq_input">
<param name="selector" type="select" label="Single or paired-end reads?">
<option value="paired">Paired-end</option>
<option value="single">Single-end</option>
</param>
<when value="paired">
<param name="input1" type="data" format="fastq" label="Forward reads" />
<param name="input2" type="data" format="fastq" label="Reverse reads" />
</when>
<when value="single">
<param name="input" type="data" format="fastq" label="Single reads" />
</when>
</conditional>
</inputs>
24 / 91

inputs > param > repeat

Two boxes appear labelled 1: Series and 2: Series, with an insert series button below them. Each series box has two inputs in it, a file input and a select box.

<command><![CDATA[
#for $i, $s in enumerate($series):
rank_of_series=$i
input_path=${s.input}
x_column=${s.xcol}
#end for
]]></command>
<inputs>
<repeat name="series" title="Series">
<param name="input" type="data" format="tabular" label="Dataset"/>
<param name="xcol" type="data_column" data_ref="input" label="Column for x axis"/>
</repeat>
</inputs>
25 / 91

It makes sense to use a <repeat> block only if it contains multiple related parameters, otherwise adding multiple="true" is preferable.

outputs

Screenshot of a galaxy history with two outputs.

<outputs>
<data name="tree" format="txt" label="${tool.name} on ${on_string}: Tree" />
<data name="annotation" format="txt"
label="${tool.name} on ${on_string}: Annotation" />
</outputs>
26 / 91

${tool.name} on ${on_string} is the default output label, need to modify this if the tool generates more than 1 output

outputs > filter

Output is collected only if the filter evaluates to True

<inputs>
<param type="select" name="format" label="Output format">
<option value="png">PNG</option>
<option value="pdf">PDF</option>
</param>
</inputs>
<outputs>
<data name="png_output" format="png" label="${tool.name} on ${on_string}: PNG">
<filter>format == "png"</filter>
</data>
<data name="pdf_output" format="pdf" label="${tool.name} on ${on_string}: PDF">
<filter>format == "pdf"</filter>
</data>
</outputs>
27 / 91

N.B. If the filter expression raises an Exception, the dataset will NOT be filtered out

detect_errors

Legacy tools (i.e. with profile unspecified or less than 16.04) by default fail only if the tool writes to stderr

Non-legacy tools by default fail if the tool exit code is not 0, which is equivalent to specify:

<command detect_errors="exit_code"> ... </command>

To fail if either the tool exit code is not 0 or "Exception:"/"Error:" appears in standard error/output:

<command detect_errors="aggressive"> ... </command>
28 / 91

stdio

If you need more precision:

<stdio>
<exit_code range=":-2" level="warning" description="Low disk space" />
<exit_code range="1:" level="fatal" />
<regex match="Error:" level="fatal" />
</stdio>
<command> ... </command>

"Warning" level allows to add information to stderr without marking the dataset as failed

29 / 91

help

Screenshot of a help block in a galaxy tool, it shows the below text block rendered according to restructured text rules. What it does is bold, and user manual is a hyperlink to the bitbucket url.

<help><![CDATA[
**What it does**
GraPhlAn is a software tool for producing high-quality circular
representations of taxonomic and phylogenetic trees. GraPhlAn focuses
on concise, integrative, informative, and publication-ready
representations of phylogenetically- and taxonomically-driven
investigation.
For more information, check the `user manual
<https://bitbucket.org/nsegata/graphlan/overview>`_.
]]></help>

Content should be in reStructuredText markup format

30 / 91

citations

Screenshot of the citations box showing 5 nicely formatted citations with italics, and hyperlinked DOIs.

<citations>
<citation type="doi">10.1093/bioinformatics/bts611</citation>
<citation type="doi">10.1093/nar/gks1219</citation>
<citation type="doi">10.1093/nar/gks1005</citation>
<citation type="doi">10.1093/bioinformatics/btq461</citation>
<citation type="doi">10.1038/nbt.2198</citation>
</citations>

If no DOI is available, a BibTeX citation can be specified with type="bibtex"

31 / 91

Quoting params

Always quote text and data parameters and output data in <command>

<command><![CDATA[
graphlan.py
...
'$input_tree'
'$png_output_image'
]]></command>
  • For security reasons
  • Paths may contain spaces
  • Prefer single quotes over double quotes
32 / 91

Multiple commands

Use && to concatenate them

<command><![CDATA[
graphlan.py
--format '$format'
&&
echo "Yeah it worked!"
]]></command>

The job will exit on the first error encountered.

You can use && or ; if using a profile >= 20.09 (the set -e shell directive is now used by default).

33 / 91

Param argument

Use the argument tag when a param name reflects the command line argument

<param argument="--size" type="integer" value="7" label="..." help="..."/>
  • It will be appended at the end of the displayed param help
  • When argument is specified and name is not, name is derived from argument by removing the initial dashes and replacing internal dashes with underscores
34 / 91

section

Use sections to group related parameters

Screenshot of the same section twice, in the first it shows Additional Options and is collapsed. In the second it is expanded and an integer input can be seen.

<section name="advanced" title="Advanced options" expanded="False">
<param argument="--size" type="integer" value="7" label="..." help="..."/>
...
</section>
35 / 91

Planemo logo, the E mimics the galaxy logo with three bars, the bottom most offset

Command-line utilities to assist in building and publishing Galaxy tools.

36 / 91

Planemo logo

An overly complicated flowchart with 11 steps and a three level hierarchy. The gist is that planemo tool_init lets a wrapper be created, planemo lint is then used. Planemo conda installs packages from a conda repository. This is then run with planemo test and planemo serve. Afterwards planemo shed_test, shed_create, and shed_update upload the wrapper to the galaxy toolshed. Then it is installed to a galaxy instance where it can be tested, and fetches the conda env from conda.

37 / 91

planemo logo again

planemo tool_init

Creates a skeleton of xml file

$ mkdir new_tool
$ cd new_tool
$ planemo tool_init --id 'some_short_id' --name 'My super tool'

Complicated version:

$ planemo tool_init --id 'samtools_sort' --name 'Samtools sort' \
--description 'order of storing aligned sequences' \
--requirement 'samtools@1.3.1' \
--example_command "samtools sort -o '1_sorted.bam' '1.bam'" \
--example_input 1.bam \
--example_output 1_sorted.bam \
--test_case \
--version_command 'samtools --version | head -1' \
--help_from_command 'samtools sort' \
--doi '10.1093/bioinformatics/btp352'
38 / 91

planemo logo again

planemo lint: Checks the syntax of a tool

$ planemo lint
Linting tool /opt/galaxy/tools/seqtk_seq.xml
Applying linter tests... CHECK
.. CHECK: 1 test(s) found.
Applying linter output... CHECK
.. INFO: 1 outputs found.
Applying linter inputs... CHECK
.. INFO: Found 1 input parameters.
Applying linter help... CHECK
.. CHECK: Tool contains help section.
.. CHECK: Help contains valid reStructuredText.
Applying linter general... CHECK
.. CHECK: Tool defines a version [0.1.0].
.. CHECK: Tool defines a name [Convert to FASTA (seqtk)].
.. CHECK: Tool defines an id [seqtk_seq].
Applying linter command... CHECK
.. INFO: Tool contains a command.
Applying linter citations... CHECK
.. CHECK: Found 1 likely valid citations.
39 / 91

planemo logo again

planemo serve

View your new tool in a local Galaxy instance

$ planemo serve

Open http://127.0.0.1:9090/ in your web browser to view your new tool

40 / 91

planemo logo again

Building Galaxy Tools

flowchart with planemo tool_init creating a wrapper tool.xml and planemo lint being run repeatedly.

41 / 91

Functional tests

42 / 91

Functional tests

  • Functional testing is a quality assurance (QA) process.
  • The tests comfort developers and users that the tools can run across different servers/architectures. And that the latest modifications don't break older features.
  • Tools are tested by feeding them inputs and parameters and verifying the outputs (typically a diff)
43 / 91

tests

<tests>
<test>
<param name="input_tree" value="input_tree.txt"/>
<param name="format" value="png"/>
<param name="dpi" value="100"/>
<param name="size" value="7"/>
<param name="pad" value="2"/>
<output name="png_output_image" file="png_image.png" />
</test>
</tests>

input_tree.txt and png_image.png must be in the test-data/ directory

44 / 91

Tool directory tree

graphlan/
├── graphlan.xml
├── graphlan.py
└── test-data/
├── input_tree.txt
└── png_image.png
45 / 91

Comparing to an expected result

<output ... compare="diff|re_match|sim_size|contains|re_match_multiline" ... />
<output name="out_file1" file="cf_maf2fasta_concat.dat" ftype="fasta" />
<output ... md5="68b329da9893e34099c7d8ad5cb9c940" />
<output ... lines_diff="4" />
<output ... compare="sim_size" delta="1000" />

Complete documentation

46 / 91
  • diff is the default
  • ftype also checks the output datatype
  • With md5 the test output file doesn't need to be distributed (useful for big output files)
  • lines_diff is useful for tools that output version number, current date, ...
  • sim_size is useful for binary files that vary at each execution (e.g. PDF)

Checking the output content

<output name="out_file1">
<assert_contents>
<has_text text="chr7" />
<not_has_text text="chr8" />
<has_text_matching expression="1274\d+53" />
<has_line_matching expression=".*\s+127489808\s+127494553" />
<!-- &#009; is XML escape code for tab -->
<has_line line="chr7&#009;127471195&#009;127489808" />
<has_n_columns n="3" />
</assert_contents>
</output>

Complete documentation

47 / 91

Checking tool stdout/stderr

<assert_stdout>
<has_text text="Step 1... determine cutoff point" />
<has_text text="Step 2... estimate parameters of null distribution" />
</assert_stdout>

Complete documentation

48 / 91

Nested inputs in test

<tests>
<test>
<section name="advanced">
<repeat name="names">
<param name="first" value="Abraham"/>
<param name="last" value="Lincoln"/>
</repeat>
<repeat name="names">
<param name="first" value="Donald"/>
<param name="last" value="Trump"/>
</repeat>
<conditional name="image">
<param name="output_image" value="yes"/>
<param name="format" value="png"/>
</conditional>
</section>
...
</test>
</tests>
49 / 91

planemo logo yet again

planemo test

Runs all functional tests

$ planemo test

An HTML report (tool_test_output.html) is automatically created with logs in case of failing test

50 / 91

planemo logo

Test Galaxy Tools

flowchart with planemo tool_init creating a wrapper tool.xml and planemo lint being run repeatedly, and now planemo test as well.

51 / 91

Dependencies

52 / 91

Dependencies

How Galaxy will deal with dependencies?

schematic of a galaxy server with dependency resolution via requirement tags at the top. On the left is the tool box with a number of xml files listed like seqtk_seq and seqtk_subseq. On the right is applications & libraries showing only a few tools like seqtk, all of the 3 multipoe subtools were collapsed

53 / 91

requirements

<requirements>
<requirement type="package" version="1.66">biopython</requirement>
<requirement type="package" version="1.0.0">graphlan</requirement>
</requirements>

Local installation using Conda packages

54 / 91

Conda logo, the C is textured.

  • Package, dependency and environment manager
  • Based on recipes describing how to install the software which are then built for their distribution
  • No compilation at installation: binaries with their dependencies, libraries...
  • Not restricted to Galaxy

See Tool Dependencies and Conda

55 / 91

Advanced features

56 / 91

configfiles

A configfile creates a text file which can then be used inside the command as:

  • A script or a module
  • A file needed to run the tool (e.g. a config file)

Cheetah code and param/output variables can be used inside configfile (like inside command).

57 / 91

configfiles

<command><![CDATA[ mb $script_nexus ]]></command>
<configfiles>
<configfile name="script_nexus"><![CDATA[
set autoclose = yes;
execute $input_data;
#if str($data_type.type) == "nuc“
lset nst=$data_type.lset_params.lset_Nst;
#end if
mcmcp ngen=$mcmcp_ngen;
mcmc;
quit
]]></configfile>
</configfiles>
set autoclose = yes;
execute dataset_42.dat;
lset nst=2 ;
mcmcp ngen=100000;
mcmc;
quit
58 / 91

macros > xml

macros.xml

<macros>
<xml name="requirements">
<requirements>
<requirement type="package" version="2.5.0">blast</requirement>
</requirements>
</xml>
<xml name="stdio">
<stdio>
<exit_code range="1" level="fatal" />
</stdio>
</xml>
</macros>

ncbi_blastn_wrapper.xml

<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements"/>
<expand macro="stdio"/>
60 / 91

macros > token

macros.xml

<macros>
<token name="@THREADS@">-num_threads "\${GALAXY_SLOTS:-8}"</token>
</macros>

ncbi_blastn_wrapper.xml

<command>
blastn -query '$query' @THREADS@ [...]
</command>
61 / 91

macros > xml > yield

macros.xml

<macros>
<xml name="requirements">
<requirements>
<requirement type="package" version="2.2.0">trinity</requirement>
<yield/>
</requirements>
</xml>
</macros>

trinity.xml

<expand macro="requirements">
<requirement type="package" version="1.1.2">bowtie</requirement>
</expand>
62 / 91

@TOOL_VERSION@ token

<macros>
<token name="@TOOL_VERSION@">1.2</token>
<token name="@VERSION_SUFFIX@">3</token>
</macros>
<tool id="seqtk_seq" name="Convert to FASTA" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">seqtk</requirement>
</requirements>

This means: the 3rd revision of the Galaxy tool for Seqtk 1.2 .

Best practice documentation

63 / 91

command > Reserved variables

<command><![CDATA[
# Email’s numeric ID (id column of galaxy_user table in the database)
echo '$__user_id__'
# User’s email address
echo '$__user_email__'
# The galaxy.app.UniverseApplication instance, gives access to all other configuration file variables.
# Should be used as a last resort, may go away in future releases.
echo '$__app__.config.user_library_import_dir'
# Check a dataset type
#if $input1.is_of_type('gff'):
echo 'input1 type is ${input1.datatype}'
#end if
]]></command>

Reserved Variables List

64 / 91

Multiple inputs - Mapping over

<param name="..." type="data" format="txt" label="..." help="..." />

File selector input

Possible to select multiple dataset:

File selector input screenshot, but now the middle "multiple files" button is checked.

  • Useful to launch the same tool on multiple datasets independently
  • One job per dataset
65 / 91

Multiple inputs - Single execution

<param name="..." type="data" format="txt" multiple="true" label="..." help="..." />

A multi-select file input field is shown, different than the normal file input there is no single file option.

In the command:

<command><![CDATA[
...
#for $input in $inputs
--input "$input"
#end for
]]></command>

One job for all selected dataset

66 / 91

Multiple outputs

<outputs>
<data name="output" format="txt">
<discover_datasets pattern="__designation_and_ext__"
directory="output_dir" visible="true" />
</data>
</outputs>
  • __designation_and_ext__: a predefined regexp,

  • catches the dataset identifier + the datatype

If the output file extension is not present/usable:

<outputs>
<data name="output" format="txt">
<discover_datasets pattern="__designation__" format="txt"
directory="output_dir" visible="true" />
</data>
</outputs>
67 / 91

Dataset collections

A dataset collection combines numerous datasets in a single entity that can be manipulated together

  • list: a simple list of datasets
  • paired: a pair of datasets, forward and reverse for NGS
  • composite: e.g. list:paired for a list of dataset pairs

Usage

  • Useful to launch a workflow on many samples
  • Sample names are kept along the workflow: element_identifier
  • Galaxy tools are available to manipulate collections
68 / 91

Dataset collections as input

Mapping over (1 job per collection element):

<param name="inputs" type="data" format="bam" label="Input BAM(s)" />

The normal file selector is shown however now the collection input is clicked.

Single execution:

  • accepted with multiple="true" as described in previous slides
  • or you can accept only collections:
<param name="inputs" type="data_collection" collection_type="list|paired|list:paired|..."
format="bam" label="Input BAM(s)" />
<command><![CDATA[ ...
#for $input in $inputs
--input '$input'
--sample_name '$input.element_identifier'
#end for
]]></command>
69 / 91

Dataset collections as output

A single paired collection:

<collection name="paired_output" type="paired" label="Split Pair">
<data name="forward" format="txt" />
<data name="reverse" format_source="input1" from_work_dir="reverse.txt" />
</collection>

Unknown number of files:

<collection name="output" type="list" label="Unknown number of files">
<discover_datasets pattern="__name_and_ext__" directory="outputs" />
</collection>
  • __name_and_ext__: a predefined regexp,
  • catches the dataset identifier + the datatype
70 / 91

Using multiple CPUs

Screenshot of two xml files. In the top is the jb_conf.xml where a command line job submission specification indicates that it will be submitted with 4 threads. A single tool, ncbi blastn wrapper is assigned to that destination. In the second xml file the blastn command uses the GALAXY_SLOTS variable to control how many threads are supplied to the tool.

blastn -query foo_bar -num_threads 4

8 is the default value if not set in destination

71 / 91

Data tables

  • They list all reference data used by tools
  • e.g.: Blast databases, BWA indexes, Fasta files
  • Stored in .loc files
  • Populated by hand or using Data Managers
  • Data Managers are dedicated kind of Galaxy tools
72 / 91

Using a data table in a tool

Three xml files are shown. At the top is the tool data table conf which mentions a tool-data/bowtie2_indices.loc. Below is that bowtie2 loc file which indicates that hg19 will be found at a specific location in the /db directory. And the third is the bowtie2 wrapper which loads options from a data table, and points to the bowtie2_indexes named table in the first xml.

73 / 91

Datatypes

  • Every Galaxy dataset is associated with a datatype
  • Datatype can be detected or user specified
  • Gain of usability

Documentation: Adding Datatypes

74 / 91

Publishing tools

75 / 91

Contributing to a community

Many tools developed by the community on GitHub repositories

Added value:

  • Easier development
  • Easier contribution for user
  • Avoid duplications of efforts
  • Automated tests on each contribution
  • Automated publishing to ToolShed
  • Principle of many eyes: if something is visible to many people then, collectively, they are more likely to find errors in it
76 / 91

IUC: Intergalactic Utilities Commission

IUC logo

  • A team maintaining high quality tools
  • Establishing and following best practices for tool development
  • Open to contributions: bug fixes, new tools, ...

https://github.com/galaxyproject/tools-iuc

77 / 91

How should I publish my tool?

Adding to an existing GitHub repository (IUC, GalaxyP, ...)

  • Read the guidelines
  • Open a pull request
  • Respond to review comments
78 / 91

How should I publish my tool?

Using your own GitHub repository

  • Reasons: ownership, specific practices, exotic tools
  • Follow the same structure as IUC
  • Automate tests and ToolShed publishing by reusing .github/ configuration
79 / 91

How should I publish my tool?

Using planemo by hand

  • Ok for few tools
  • Makes contributing harder
  • Not recommended

Check out our tutorial to publish to the ToolShed using Planemo

80 / 91

Continuous Integration

Github logo

Planemo logo

Conda logo

81 / 91

Continuous Integration

  • Create a Pull Request on a GitHub repository
  • Tests are automatically run on GitHub Actions
  • Other contributors review your tool
  • The Pull Request is accepted when all the lights are green
  • The tool is automatically uploaded to the ToolShed
82 / 91

GitHub Actions configuration

GitHub Actions configured in the .github/ directory

Screenshot of a CI configuration file. It runs several planemo commands during before install, install, and script portions.

Uses a standard GitHub Action developed on https://github.com/galaxyproject/planemo-ci-action

83 / 91

GitHub Actions execution

Screenshot of a CI execution showing a tool test passing. Screenshot of a CI execution showing details of test logs.

84 / 91

GitHub Actions: test reports

Downloadable HTML report, open it with a web browser to see the details

Screenshot of a CI execution artifacts, with a link to download an HTML report of the tests.

Screenshot of an example of HTML report of the tests.

85 / 91

ToolShed

  • Need to create a .shed.yml file in the tool directory of the GitHub repository:
categories: [Sequence Analysis]
description: Tandem Repeats Finder description
long_description: A long long description.
name: tandem_repeats_finder_2
owner: gandres
planemo shed_init --name="tandem_repeats_finder_2"
--owner="gandres"
--description="Tandem Repeats Finder description"
--long_description="A long long description."
--category="Sequence Analysis"
[--remote_repository_url=<URL to .shed.yml on github>]
[--homepage_url=<Homepage for tool.>]
86 / 91

Tool suites

A tool suite is a group of related tools that can all be installed at once.

Defined in .shed.yml: implicitly define repositories for each individual tool in the directory and build a suite for those tools.

Example: trinity/.shed.yml

[...]
auto_tool_repositories:
name_template: ""
description_template: " (from the Trinity tool suite)"
suite:
name: "suite_trinity"
description: Trinity tools to assemble transcript sequences from Illumina RNA-Seq data.
87 / 91

Check

planemo shed_lint --tools --ensure_metadata
Linting repository […]/tandem_repeats_finder
Applying linter expansion... CHECK
.. INFO: Included files all found.
Applying linter tool_dependencies_xsd... CHECK
.. INFO: tool_dependencies.xml found and appears to be valid XML
Applying linter tool_dependencies_actions... CHECK
.. INFO: Parsed tool dependencies.
Applying linter repository_dependencies... CHECK
.. INFO: No repository_dependencies.xml, skipping.
Applying linter shed_yaml... CHECK
.. INFO: .shed.yml found and appears to be valid YAML.
Applying linter readme... CHECK
.. INFO: No README found skipping.
+Linting tool […]/tandem_repeats_finder/tandem_repeats_finder_wrapper.xml
[…]
88 / 91

keypoints Key points

  • Galaxy Tool Syntax

  • Use Planemo

  • Use Conda

  • Use GitHub

  • Use GitHub Actions

  • No more excuse to develop crappy Galaxy tools

89 / 91

curriculum Do you want to extend your knowledge?

Follow one of our recommended follow-up trainings:

90 / 91

Thank You!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors!

Galaxy Training Network

Tutorial Content is licensed under Creative Commons Attribution 4.0 International License.

91 / 91

question Questions

  • What is a tool for Galaxy?

  • How to write a best-practice tool?

  • How to deal with the tool environment?

2 / 91
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow