+ - 0:00:00
Notes for current slide

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other.

Useful when presenting.

Notes for next slide



Getting data into Galaxy



last_modification Updated:   purlPURL: gxy.io/GTN:S00062

text-document Plain-text slides |

Tip: press P to view the presenter notes | arrow-keys Use arrow keys to move between slides
1 / 47

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other.

Useful when presenting.

Requirements

Before diving into this slide deck, we recommend you to have a look at:

2 / 47

question Questions

  • How do I get my data into Galaxy?

  • How do I get public data into Galaxy?

3 / 47

Getting data into Galaxy

4 / 47

-> Pressing P will toggle presenter mode.

Many ways to get data into your workspace

  1. Import using Get Data sources e.g. UCSC, SRA
  2. Import from a Galaxy Data Library
  3. Import using Upload File
    • Import from your computer
    • Directly enter text
    • Import from a URL
    • Import using FTP
    • Import directly into Collection
    • Import using Rule Builder
5 / 47
  • To do analysis in Galaxy you first need data to work on.
  • There are many ways and sources for getting data into your history.
  • This tutorial will cover all of the techniques listed here.

Best method depends on where the data is, and how big it is

flowchart for getting data into galaxy. SRA datasets should use the upload tool, if you have many or big datasets use FTP, if they're from the web use the URL upload.

Source: Galaxy Community Hub

6 / 47

1. The Get Data toolbox section

7 / 47
  • Click on the Get Data toolbox in the toolbox (the left panel)

Click on Get Data to expand it

8 / 47

A typical list of data sources

  • Expands to show data sources

    • E.g. UCSC, NCBI, Uniprot, ..
    • The specific data sources available on your Galaxy instance are determine by the server's administrator
  • All of these data sources can bring datasets (files) into your Galaxy workspace (history)

9 / 47

This shows the list of data sources that were available on usegalaxy.org in mid 2017.

Two large data sources you can access through Galaxy are UCSC and SRA

Screenshot of toolbox with ucsc entered in search galaxy toolbox with sra entered in search box.

10 / 47

2. Import from Shared Data Library

11 / 47
  • Top menu bar -> Shared Data -> Data Libraries

  • Configured by a Galaxy Administator

  • Can be imported directly into your history

  • Example: all GTN tutorial data

galaxy top menu dropdown shared data, showing Data Libraries

12 / 47

You can select the files you want and send to History as datasets or collection

data library screenshot with a number of datasets selected and export to history menu open

13 / 47

3. Upload from your computer

14 / 47

click on upload button

Upload file form

15 / 47
  • The Upload File data source can import data:
    • from your computer
    • by directly entering text
    • using a URL
    • and via FTP

This is probably the most commonly used tool for bringing data into Galaxy, and it is installed on almost every Galaxy server.

Choose files

Options for importing files from your laptop

16 / 47
  • Drag and drop is supported
  • as is the standard file selection using your browser.

Set Metadata

  • Datatype (e.g. FastQ, VCF, BAM, tabular, ..)
    • Galaxy will autodetect by default (sometimes guesses wrong)
  • Genome Build (e.g. hg19, mm9, ..)
    • must be set manually (can be done later as well)

upload dialog from galaxy with a number of files queued.

17 / 47
  • Here we have imported 13 files

    • one with genome annotation in GTF format
    • 12 paired end read files from an RNA-Seq experiment*
    • could import them now and have Galaxy guess at their file types.
    • From UC Davis Training Material.
  • Can be set for all files at once:

Set datatype for all imported datasets

18 / 47
  • Or per file:

Manually set datatype for one dataset

19 / 47
  • Here we are manually setting the first dataset's datatype to GTF, a common genome annotation format.

Start upload process:

  • Once everything is ready, click the Start button

Ready to upload files. Click on start

20 / 47
  • Data transfer does not start until you click Start.
You can then close the form

Ready to upload files. Click on start

21 / 47

All the items will appear in your history

Files are loaded into your current history.

and are ready to use when green.

22 / 47

Directly enter text

23 / 47
  • Sometimes it's useful to file content directly.
    • only works if your dataset is tiny
    • choose Paste/Fetch data

Select Paste/Fetch data

24 / 47

Enter the data by typing (or pasting) it in the input box:

Select Paste/Fetch data

You can also set the datatype and build. Click Start, and then Close, and the new item shows up as Pasted Entry in your history.

25 / 47

Import using URL

26 / 47

The data might already be available on a web server somewhere. To avoid downloading data to your computer and uploading to Galaxy in two steps, you can instruct Galaxy to directly fetch the data from a given URL.

Select Paste/Fetch data

Select Paste/Fetch data

27 / 47

Enter the URLs (one per line) into the input box:

Select Paste/Fetch data

Click Start, and then Close, and the new items show up in your history with the URL as their name.

28 / 47

Import using FTP

29 / 47
  • Why use FTP?

    • Older Galaxies did not support uploading files larger than 2GB in size
    • Many people are very comfortable using FTP to upload large datasets and you can sometimes resume interrupted uploads.
  • How to use FTP

30 / 47

Make sure you have an FTP client installed

FileZilla

  • FileZilla is a free FTP client that is available on Windows, MacOS, and Linux
  • There are many other options
  • If you don't already have an FTP client, download and install FileZilla.
31 / 47

Establish FTP connection to your Galaxy server

  • Provide
    • the instance's FTP server name (e.g. usegalaxy.org, ftp.usegalaxy.eu)
    • your full username (usually an email address) and password

FTP Connection Params

32 / 47

Successfully connect

Successfully connected

33 / 47

Navigate to the files you want to transfer

Right click on the files and upload them.

34 / 47

FTP transfer in progress

FTP Transfer in progress...

35 / 47

FTP transfer complete

... and transfer complete.

36 / 47

Where did my files go?

  • File Upload menu -> Choose FTP files

choose FTP files

37 / 47
  • Select files to import into your history
  • Click Start

choose FTP files

38 / 47

As you can see, this dialog gives connection settings too

Import directly into Collection

39 / 47
  • Select Collection tab at top of upload menu
  • Add files as before (upload from computer, paste/fetch, FTP)

Direct collection Start

40 / 47
  • Choose collection type (at bottom)
  • Set metadata (file type, genome build)
  • Click "Build"

Direct collection Build

41 / 47
  • Name your collection
  • Click Create button

Direct collection Name

42 / 47
  • Collection is now imported in your history
  • Click on it to expand it and view all files in collection

Direct collection History

43 / 47

Import using Rule Based uploader

44 / 47
  • When you want to import many files from URLs or Accession IDs directly into collection(s)
  • Supports advanced "rules" for creating collections from sample sheets
  • Click Rule-based tab at top of file upload window

Rule Uploader

45 / 47

Import using Rule Based uploader

Learn how to use it in the dedicated Rule Based Uploader tutorial

46 / 47

Thank You!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors!

Galaxy Training Network

Tutorial Content is licensed under Creative Commons Attribution 4.0 International License.

47 / 47

Requirements

Before diving into this slide deck, we recommend you to have a look at:

2 / 47
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow