Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.
Press P
again to switch presenter notes off
Press C
to create a new window where the same presentation will be displayed.
This window is linked to the main window. Changing slides on one will cause the
slide to change on the other.
Useful when presenting.
Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.
Press P
again to switch presenter notes off
Press C
to create a new window where the same presentation will be displayed.
This window is linked to the main window. Changing slides on one will cause the
slide to change on the other.
Useful when presenting.
<inputs>
(datasets and parameters) specified in the tool XML are exposed in the Galaxy tool UIExecute
button, Galaxy fills the <command>
template in the XML with the inputs entered by the user and execute the Cheetah code, producing a script as output<outputs>
XML tag setGalaxy tool XML format is formally defined in a XML Schema Definition (XSD), used to generate the corresponding online documentation
You are free to use your prefered code editor to write Galaxy tools.
If you use Visual Studio Code (or Codium), we recommend to install the dedicated extension.
It provides XML validation, tags and attributes completion, help/documentation on hover, and other smart features to assist in following best practices.
tool
<tool id="graphlan" name="GraPhlAn" version="1.1.3+galaxy2" profile="22.05">
id
: unique identifier of your tool, should contain only [a-z0-9_-]
name
: shown to the user, displayed in the tool boxversion
: the version of the wrapped tool, followed by a +galaxyX
suffix for wrapper versionprofile
: minimum Galaxy version that should be required to run this tool (IUC recommends not older than 1 year)tool
tag defines the tool naming and versionid
attribute is the unique identifier of your tool, it should contain only letters, digits, underscores or dashesname
attribute is shown to the user and displayed in the tool boxversion
attribute contains the version of the wrapped tool, followed by a +galaxyX
suffix for wrapper versionprofile
attribute should be set to the minimum Galaxy version that should be required to run this tool (IUC recommends not older than 1 year)command
How to invoke the tool?
<requirements> <requirement type="package" version="1.1.3">graphlan</requirement></requirements><command><![CDATA[graphlan.py--format $format...]]></command>
If the script is provided with the tool xml:
<requirements> <requirement type="package" version="2.7">python</requirement></requirements><command><![CDATA[python '$__tool_directory__/graphlan.py'--format $format...]]></command>
graphlan.py
is expected to be on the PATH and executable when the job executes. This is usually accomplished by specifying some <requirement/>
tags.$__tool_directory__
is a special variable which is substituted by Galaxy with the directory where the tool XML isinputs
> param
to command
Parameters are directly linked to variables in <command>
by the name
or argument
attribute
Parameters can be optional or required.
<command><![CDATA[graphlan.py...#if str($dpi): --dpi $dpi#end if'$input_tree'...]]></command><inputs> <param name="input_tree" type="data" label="..."/> <param argument="--dpi" type="integer" optional="true" label="..." help="For non vectorial formats" /></inputs>
#if ... #end if
syntax comes from the Cheetah template language, which has a Python-like syntaxname
or argument
attribute identifies a parameter (details of argument
later).data
, data_collection
, integer
, float
, text
, select
, boolean
, color
, data_column
,...)
and can be optional.inputs
> param
> data
<param name="..." type="data" format="txt" label="..." help="..." />
min
and max
are specified, a slider is shown in additioninputs
> param
> conditional
<command><![CDATA[#if $fastq_input.selector == 'paired': '$fastq_input.input1' '$fastq_input.input2'#else: '$fastq_input.input'#end if]]></command><inputs> <conditional name="fastq_input"> <param name="selector" type="select" label="Single or paired-end reads?"> <option value="paired">Paired-end</option> <option value="single">Single-end</option> </param> <when value="paired"> <param name="input1" type="data" format="fastq" label="Forward reads" /> <param name="input2" type="data" format="fastq" label="Reverse reads" /> </when> <when value="single"> <param name="input" type="data" format="fastq" label="Single reads" /> </when> </conditional></inputs>
inputs
> param
> repeat
<command><![CDATA[#for $i, $s in enumerate($series): rank_of_series=$i input_path=${s.input} x_column=${s.xcol}#end for]]></command><inputs> <repeat name="series" title="Series"> <param name="input" type="data" format="tabular" label="Dataset"/> <param name="xcol" type="data_column" data_ref="input" label="Column for x axis"/> </repeat></inputs>
It makes sense to use a <repeat>
block only if it contains multiple related parameters, otherwise adding multiple="true"
is preferable.
${tool.name} on ${on_string}
is the default output label, need to modify this if the tool generates more than 1 output
outputs
> filter
Output is collected only if the filter
evaluates to True
<inputs> <param type="select" name="format" label="Output format"> <option value="png">PNG</option> <option value="pdf">PDF</option> </param></inputs><outputs> <data name="png_output" format="png" label="${tool.name} on ${on_string}: PNG"> <filter>format == "png"</filter> </data> <data name="pdf_output" format="pdf" label="${tool.name} on ${on_string}: PDF"> <filter>format == "pdf"</filter> </data></outputs>
N.B. If the filter expression raises an Exception, the dataset will NOT be filtered out
detect_errors
Legacy tools (i.e. with profile
unspecified or less than 16.04) by default fail only if the tool writes to stderr
Non-legacy tools by default fail if the tool exit code is not 0, which is equivalent to specify:
<command detect_errors="exit_code"> ... </command>
To fail if either the tool exit code is not 0 or "Exception:"/"Error:" appears in standard error/output:
<command detect_errors="aggressive"> ... </command>
stdio
If you need more precision:
<stdio> <exit_code range=":-2" level="warning" description="Low disk space" /> <exit_code range="1:" level="fatal" /> <regex match="Error:" level="fatal" /></stdio><command> ... </command>
"Warning" level allows to add information to stderr
without marking the dataset as failed
help
<help><![CDATA[**What it does**GraPhlAn is a software tool for producing high-quality circularrepresentations of taxonomic and phylogenetic trees. GraPhlAn focuseson concise, integrative, informative, and publication-readyrepresentations of phylogenetically- and taxonomically-driveninvestigation.For more information, check the `user manual<https://bitbucket.org/nsegata/graphlan/overview>`_.]]></help>
Content should be in reStructuredText markup format
citations
<citations> <citation type="doi">10.1093/bioinformatics/bts611</citation> <citation type="doi">10.1093/nar/gks1219</citation> <citation type="doi">10.1093/nar/gks1005</citation> <citation type="doi">10.1093/bioinformatics/btq461</citation> <citation type="doi">10.1038/nbt.2198</citation></citations>
If no DOI is available, a BibTeX citation can be specified with type="bibtex"
Use the argument
tag when a param
name reflects the command line argument
<param argument="--size" type="integer" value="7" label="..." help="..."/>
argument
is specified and name
is not, name
is derived from argument
by removing the initial dashes and replacing internal dashes with underscoresCommand-line utilities to assist in building and publishing Galaxy tools.
planemo tool_init
Creates a skeleton of xml file
$ mkdir new_tool$ cd new_tool$ planemo tool_init --id 'some_short_id' --name 'My super tool'
Complicated version:
$ planemo tool_init --id 'samtools_sort' --name 'Samtools sort' \ --description 'order of storing aligned sequences' \ --requirement 'samtools@1.3.1' \ --example_command "samtools sort -o '1_sorted.bam' '1.bam'" \ --example_input 1.bam \ --example_output 1_sorted.bam \ --test_case \ --version_command 'samtools --version | head -1' \ --help_from_command 'samtools sort' \ --doi '10.1093/bioinformatics/btp352'
planemo lint
: Checks the syntax of a tool
$ planemo lintLinting tool /opt/galaxy/tools/seqtk_seq.xmlApplying linter tests... CHECK.. CHECK: 1 test(s) found.Applying linter output... CHECK.. INFO: 1 outputs found.Applying linter inputs... CHECK.. INFO: Found 1 input parameters.Applying linter help... CHECK.. CHECK: Tool contains help section... CHECK: Help contains valid reStructuredText.Applying linter general... CHECK.. CHECK: Tool defines a version [0.1.0]... CHECK: Tool defines a name [Convert to FASTA (seqtk)]... CHECK: Tool defines an id [seqtk_seq].Applying linter command... CHECK.. INFO: Tool contains a command.Applying linter citations... CHECK.. CHECK: Found 1 likely valid citations.
planemo serve
View your new tool in a local Galaxy instance
$ planemo serve
Open http://127.0.0.1:9090/ in your web browser to view your new tool
tests
<tests> <test> <param name="input_tree" value="input_tree.txt"/> <param name="format" value="png"/> <param name="dpi" value="100"/> <param name="size" value="7"/> <param name="pad" value="2"/> <output name="png_output_image" file="png_image.png" /> </test></tests>
input_tree.txt
and png_image.png
must be in the test-data/
directory
<output ... compare="diff|re_match|sim_size|contains|re_match_multiline" ... />
<output name="out_file1" file="cf_maf2fasta_concat.dat" ftype="fasta" />
<output ... md5="68b329da9893e34099c7d8ad5cb9c940" />
<output ... lines_diff="4" />
<output ... compare="sim_size" delta="1000" />
diff
is the defaultftype
also checks the output datatypemd5
the test output file doesn't need to be distributed (useful for big output files)lines_diff
is useful for tools that output version number, current date, ...sim_size
is useful for binary files that vary at each execution (e.g. PDF)<output name="out_file1"> <assert_contents> <has_text text="chr7" /> <not_has_text text="chr8" /> <has_text_matching expression="1274\d+53" /> <has_line_matching expression=".*\s+127489808\s+127494553" /> <!-- 	 is XML escape code for tab --> <has_line line="chr7	127471195	127489808" /> <has_n_columns n="3" /> </assert_contents></output>
<assert_stdout> <has_text text="Step 1... determine cutoff point" /> <has_text text="Step 2... estimate parameters of null distribution" /></assert_stdout>
test
<tests> <test> <section name="advanced"> <repeat name="names"> <param name="first" value="Abraham"/> <param name="last" value="Lincoln"/> </repeat> <repeat name="names"> <param name="first" value="Donald"/> <param name="last" value="Trump"/> </repeat> <conditional name="image"> <param name="output_image" value="yes"/> <param name="format" value="png"/> </conditional> </section> ... </test></tests>
See Tool Dependencies and Conda
configfiles
<command><![CDATA[ mb $script_nexus ]]></command><configfiles> <configfile name="script_nexus"><![CDATA[set autoclose = yes;execute $input_data;#if str($data_type.type) == "nuc“ lset nst=$data_type.lset_params.lset_Nst;#end ifmcmcp ngen=$mcmcp_ngen;mcmc;quit ]]></configfile></configfiles>
set autoclose = yes;execute dataset_42.dat;lset nst=2 ;mcmcp ngen=100000;mcmc;quit
macros
> xml
macros.xml
<macros> <xml name="requirements"> <requirements> <requirement type="package" version="2.5.0">blast</requirement> </requirements> </xml> <xml name="stdio"> <stdio> <exit_code range="1" level="fatal" /> </stdio> </xml></macros>
ncbi_blastn_wrapper.xml
<macros> <import>macros.xml</import></macros><expand macro="requirements"/><expand macro="stdio"/>
macros
> xml
> yield
macros.xml
<macros> <xml name="requirements"> <requirements> <requirement type="package" version="2.2.0">trinity</requirement> <yield/> </requirements> </xml></macros>
trinity.xml
<expand macro="requirements"> <requirement type="package" version="1.1.2">bowtie</requirement></expand>
@TOOL_VERSION@
token<macros> <token name="@TOOL_VERSION@">1.2</token> <token name="@VERSION_SUFFIX@">3</token></macros>
<tool id="seqtk_seq" name="Convert to FASTA" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@"> <requirements> <requirement type="package" version="@TOOL_VERSION@">seqtk</requirement> </requirements>
This means: the 3rd revision of the Galaxy tool for Seqtk 1.2 .
command
> Reserved variables<command><![CDATA[# Email’s numeric ID (id column of galaxy_user table in the database)echo '$__user_id__'# User’s email addressecho '$__user_email__'# The galaxy.app.UniverseApplication instance, gives access to all other configuration file variables.# Should be used as a last resort, may go away in future releases.echo '$__app__.config.user_library_import_dir'# Check a dataset type#if $input1.is_of_type('gff'): echo 'input1 type is ${input1.datatype}'#end if]]></command>
<outputs> <data name="output" format="txt"> <discover_datasets pattern="__designation_and_ext__" directory="output_dir" visible="true" /> </data></outputs>
__designation_and_ext__
: a predefined regexp,
catches the dataset identifier + the datatype
If the output file extension is not present/usable:
<outputs> <data name="output" format="txt"> <discover_datasets pattern="__designation__" format="txt" directory="output_dir" visible="true" /> </data></outputs>
A dataset collection combines numerous datasets in a single entity that can be manipulated together
list
: a simple list of datasetspaired
: a pair of datasets, forward
and reverse
for NGSlist:paired
for a list of dataset pairsUsage
element_identifier
Mapping over (1 job per collection element):
<param name="inputs" type="data" format="bam" label="Input BAM(s)" />
Single execution:
multiple="true"
as described in previous slides<param name="inputs" type="data_collection" collection_type="list|paired|list:paired|..." format="bam" label="Input BAM(s)" />
<command><![CDATA[ ...#for $input in $inputs --input '$input' --sample_name '$input.element_identifier'#end for]]></command>
A single paired collection:
<collection name="paired_output" type="paired" label="Split Pair"> <data name="forward" format="txt" /> <data name="reverse" format_source="input1" from_work_dir="reverse.txt" /></collection>
Unknown number of files:
<collection name="output" type="list" label="Unknown number of files"> <discover_datasets pattern="__name_and_ext__" directory="outputs" /></collection>
__name_and_ext__
: a predefined regexp,Documentation: Adding Datatypes
Many tools developed by the community on GitHub repositories
Added value:
https://github.com/galaxyproject/tools-iuc
Using planemo by hand
Check out our tutorial to publish to the ToolShed using Planemo
GitHub Actions configured in the .github/
directory
Uses a standard GitHub Action developed on https://github.com/galaxyproject/planemo-ci-action
.shed.yml
file in the tool directory of the GitHub repository:categories: [Sequence Analysis]description: Tandem Repeats Finder descriptionlong_description: A long long description.name: tandem_repeats_finder_2owner: gandres
planemo shed_init --name="tandem_repeats_finder_2" --owner="gandres" --description="Tandem Repeats Finder description" --long_description="A long long description." --category="Sequence Analysis" [--remote_repository_url=<URL to .shed.yml on github>] [--homepage_url=<Homepage for tool.>]
A tool suite is a group of related tools that can all be installed at once.
Defined in .shed.yml
: implicitly define repositories for each individual tool in the directory and build a suite for those tools.
Example: trinity/.shed.yml
[...]auto_tool_repositories: name_template: "" description_template: " (from the Trinity tool suite)"suite: name: "suite_trinity" description: Trinity tools to assemble transcript sequences from Illumina RNA-Seq data.
planemo shed_lint --tools --ensure_metadata
Linting repository […]/tandem_repeats_finderApplying linter expansion... CHECK.. INFO: Included files all found.Applying linter tool_dependencies_xsd... CHECK.. INFO: tool_dependencies.xml found and appears to be valid XMLApplying linter tool_dependencies_actions... CHECK.. INFO: Parsed tool dependencies.Applying linter repository_dependencies... CHECK.. INFO: No repository_dependencies.xml, skipping.Applying linter shed_yaml... CHECK.. INFO: .shed.yml found and appears to be valid YAML.Applying linter readme... CHECK.. INFO: No README found skipping.+Linting tool […]/tandem_repeats_finder/tandem_repeats_finder_wrapper.xml[…]
Follow one of our recommended follow-up trainings:
This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors!
Tutorial Content is licensed under Creative Commons Attribution 4.0 International License.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |