Nanopore Preprocessing

microbiome-pathogen-detection-from-nanopore-foodborne-data/nanopore-preprocessing

Author(s)
Bérénice Batut, Engy Nasr, Paul Zierep
version Version
3
last_modification Last updated
Jun 7, 2024
license License
MIT
galaxy-tags Tags
name:Collection
name:microGalaxy
name:PathoGFAIR
name:Nanopore
name:IWC

Features
Tutorial
hands_on Pathogen detection from (direct Nanopore) sequencing data using Galaxy - Foodborne Edition
workflow Other workflows associated with this material
Workflow Testing
Tests: ✅
Results: Not yet automated
FAIRness purl PURL
https://gxy.io/GTN:W00143
RO-Crate logo with flask Download Workflow RO-Crate Workflowhub cloud with gears logo View on (Dev) WorkflowHub
Launch in Tutorial Mode question
galaxy-download Download
flowchart TD
  0["ℹ️ Input Parameter\nsamples_profile"];
  style 0 fill:#ded,stroke:#393,stroke-width:4px;
  1["ℹ️ Input Collection\ncollection_of_all_samples"];
  style 1 stroke:#2c3143,stroke-width:4px;
  2["Porechop"];
  1 -->|output| 2;
  34ea26db-11cb-41ee-85c3-75af8a53a2c0["Output\nporechop_output_trimmed_reads"];
  2 --> 34ea26db-11cb-41ee-85c3-75af8a53a2c0;
  style 34ea26db-11cb-41ee-85c3-75af8a53a2c0 stroke:#2c3143,stroke-width:4px;
  3["NanoPlot"];
  1 -->|output| 3;
  15ecf5b1-e0eb-405a-ac3a-359feb66d4cd["Output\nnanoplot_qc_on_reads_before_preprocessing_nanostats"];
  3 --> 15ecf5b1-e0eb-405a-ac3a-359feb66d4cd;
  style 15ecf5b1-e0eb-405a-ac3a-359feb66d4cd stroke:#2c3143,stroke-width:4px;
  304110f9-60d0-4ba2-8b3b-fae0e2a49554["Output\nnanoplot_on_reads_before_preprocessing_nanostats_post_filtering"];
  3 --> 304110f9-60d0-4ba2-8b3b-fae0e2a49554;
  style 304110f9-60d0-4ba2-8b3b-fae0e2a49554 stroke:#2c3143,stroke-width:4px;
  f2bd0a1f-cd60-4a36-a7d0-8025cc19ea2e["Output\nnanoplot_qc_on_reads_before_preprocessing_html_report"];
  3 --> f2bd0a1f-cd60-4a36-a7d0-8025cc19ea2e;
  style f2bd0a1f-cd60-4a36-a7d0-8025cc19ea2e stroke:#2c3143,stroke-width:4px;
  4["FastQC"];
  1 -->|output| 4;
  d0a64624-05d0-4068-835b-a025fc011760["Output\nfastqc_quality_check_before_preprocessing_html_file"];
  4 --> d0a64624-05d0-4068-835b-a025fc011760;
  style d0a64624-05d0-4068-835b-a025fc011760 stroke:#2c3143,stroke-width:4px;
  e61fef5d-1bc8-4c8e-be6a-f74e210e9920["Output\nfastqc_quality_check_before_preprocessing_text_file"];
  4 --> e61fef5d-1bc8-4c8e-be6a-f74e210e9920;
  style e61fef5d-1bc8-4c8e-be6a-f74e210e9920 stroke:#2c3143,stroke-width:4px;
  5["fastp"];
  2 -->|outfile| 5;
  a2219483-50b7-4aed-98dd-333ad2e12eb8["Output\nnanopore_sequenced_reads_processed_with_fastp_after_host_removal"];
  5 --> a2219483-50b7-4aed-98dd-333ad2e12eb8;
  style a2219483-50b7-4aed-98dd-333ad2e12eb8 stroke:#2c3143,stroke-width:4px;
  2a9a8b4d-458b-40e7-9a21-fb7108d5bbe4["Output\nnanopore_sequenced_reads_processed_with_fastp_after_host_removal_html_report"];
  5 --> 2a9a8b4d-458b-40e7-9a21-fb7108d5bbe4;
  style 2a9a8b4d-458b-40e7-9a21-fb7108d5bbe4 stroke:#2c3143,stroke-width:4px;
  6["MultiQC"];
  4 -->|text_file| 6;
  ebffe782-a56c-431f-8af4-c0cb8d7a02fc["Output\nmultiQC_stats_before_preprocessing"];
  6 --> ebffe782-a56c-431f-8af4-c0cb8d7a02fc;
  style ebffe782-a56c-431f-8af4-c0cb8d7a02fc stroke:#2c3143,stroke-width:4px;
  0f92196d-047d-4918-819d-c0ff7cd3ae85["Output\nmultiQC_html_report_before_preprocessing"];
  6 --> 0f92196d-047d-4918-819d-c0ff7cd3ae85;
  style 0f92196d-047d-4918-819d-c0ff7cd3ae85 stroke:#2c3143,stroke-width:4px;
  7["Map with minimap2"];
  0 -->|output| 7;
  5 -->|out1| 7;
  9d7bb3b7-09a1-401f-a132-bb35a53375ea["Output\nbam_map_to_host"];
  7 --> 9d7bb3b7-09a1-401f-a132-bb35a53375ea;
  style 9d7bb3b7-09a1-401f-a132-bb35a53375ea stroke:#2c3143,stroke-width:4px;
  8["NanoPlot"];
  5 -->|out1| 8;
  b5899290-4c57-4662-ad22-860654652ade["Output\nnanoplot_qc_on_reads_after_preprocessing_html_report"];
  8 --> b5899290-4c57-4662-ad22-860654652ade;
  style b5899290-4c57-4662-ad22-860654652ade stroke:#2c3143,stroke-width:4px;
  949bfdf5-3d79-4dad-bdd8-c3a25e6af4cf["Output\nnanoplot_on_reads_after_preprocessing_nanostats_post_filtering"];
  8 --> 949bfdf5-3d79-4dad-bdd8-c3a25e6af4cf;
  style 949bfdf5-3d79-4dad-bdd8-c3a25e6af4cf stroke:#2c3143,stroke-width:4px;
  42db7f93-919e-4bbb-81a1-06411a9da410["Output\nnanoplot_qc_on_reads_after_preprocessing_nanostats"];
  8 --> 42db7f93-919e-4bbb-81a1-06411a9da410;
  style 42db7f93-919e-4bbb-81a1-06411a9da410 stroke:#2c3143,stroke-width:4px;
  9["FastQC"];
  5 -->|out1| 9;
  09306471-afa0-4106-9cc7-259b93dfc862["Output\nfastqc_quality_check_after_preprocessing_text_file"];
  9 --> 09306471-afa0-4106-9cc7-259b93dfc862;
  style 09306471-afa0-4106-9cc7-259b93dfc862 stroke:#2c3143,stroke-width:4px;
  084f982f-20f1-457e-8012-91ebbb85633d["Output\nfastqc_quality_check_after_preprocessing_html_file"];
  9 --> 084f982f-20f1-457e-8012-91ebbb85633d;
  style 084f982f-20f1-457e-8012-91ebbb85633d stroke:#2c3143,stroke-width:4px;
  10["Split BAM by reads mapping status"];
  7 -->|alignment_output| 10;
  14a53fe2-f296-43aa-86b7-243278c1050c["Output\nnon_host_sequences_bam"];
  10 --> 14a53fe2-f296-43aa-86b7-243278c1050c;
  style 14a53fe2-f296-43aa-86b7-243278c1050c stroke:#2c3143,stroke-width:4px;
  3b1e626f-6bc1-484c-be01-366534361b73["Output\nhost_sequences_bam"];
  10 --> 3b1e626f-6bc1-484c-be01-366534361b73;
  style 3b1e626f-6bc1-484c-be01-366534361b73 stroke:#2c3143,stroke-width:4px;
  11["Select"];
  9 -->|text_file| 11;
  a809853b-119f-44d2-986b-8d2006439fbe["Output\ntotal_sequences_before_hosts_sequences_removal"];
  11 --> a809853b-119f-44d2-986b-8d2006439fbe;
  style a809853b-119f-44d2-986b-8d2006439fbe stroke:#2c3143,stroke-width:4px;
  12["Samtools fastx"];
  10 -->|mapped| 12;
  10d4eaec-81d8-444e-8075-7b77a1fb6870["Output\nhost_sequences_fastq"];
  12 --> 10d4eaec-81d8-444e-8075-7b77a1fb6870;
  style 10d4eaec-81d8-444e-8075-7b77a1fb6870 stroke:#2c3143,stroke-width:4px;
  13["Samtools fastx"];
  10 -->|unmapped| 13;
  0c2dd74d-ac4f-45cf-839c-50386a7ece28["Output\nnon_host_sequences_fastq"];
  13 --> 0c2dd74d-ac4f-45cf-839c-50386a7ece28;
  style 0c2dd74d-ac4f-45cf-839c-50386a7ece28 stroke:#2c3143,stroke-width:4px;
  14["Collapse Collection"];
  11 -->|out_file1| 14;
  15["Filter failed datasets"];
  12 -->|output| 15;
  16["Kraken2"];
  13 -->|output| 16;
  203d303e-8f3a-4242-971f-b345842ebdb8["Output\nkraken2_with_kalamri_database_output"];
  16 --> 203d303e-8f3a-4242-971f-b345842ebdb8;
  style 203d303e-8f3a-4242-971f-b345842ebdb8 stroke:#2c3143,stroke-width:4px;
  843afd4d-23a8-46e7-b945-8b67dd7ae341["Output\nkraken2_with_kalamri_database_report"];
  16 --> 843afd4d-23a8-46e7-b945-8b67dd7ae341;
  style 843afd4d-23a8-46e7-b945-8b67dd7ae341 stroke:#2c3143,stroke-width:4px;
  17["Cut"];
  14 -->|output| 17;
  d07be9f1-d250-4008-91ee-59a68521eb56["Output\nquality_retained_all_reads"];
  17 --> d07be9f1-d250-4008-91ee-59a68521eb56;
  style d07be9f1-d250-4008-91ee-59a68521eb56 stroke:#2c3143,stroke-width:4px;
  18["FastQC"];
  15 -->|output| 18;
  b0ee6e31-0eb1-437d-8c04-fc3640b9a0b7["Output\nhosts_qc_text_file"];
  18 --> b0ee6e31-0eb1-437d-8c04-fc3640b9a0b7;
  style b0ee6e31-0eb1-437d-8c04-fc3640b9a0b7 stroke:#2c3143,stroke-width:4px;
  b72ff57b-0921-43bf-a817-6cd444c8f3cb["Output\nhosts_qc_html"];
  18 --> b72ff57b-0921-43bf-a817-6cd444c8f3cb;
  style b72ff57b-0921-43bf-a817-6cd444c8f3cb stroke:#2c3143,stroke-width:4px;
  19["Krakentools: Extract Kraken Reads By ID"];
  5 -->|out1| 19;
  16 -->|report_output| 19;
  16 -->|output| 19;
  57e3b725-8e13-40b2-9acc-31fd56ebc80a["Output\ncollection_of_preprocessed_samples"];
  19 --> 57e3b725-8e13-40b2-9acc-31fd56ebc80a;
  style 57e3b725-8e13-40b2-9acc-31fd56ebc80a stroke:#2c3143,stroke-width:4px;
  20["Select"];
  18 -->|text_file| 20;
  3ba35c71-32f0-4741-98d4-ea8522e27500["Output\ntotal_sequences_after_hosts_sequences_removal"];
  20 --> 3ba35c71-32f0-4741-98d4-ea8522e27500;
  style 3ba35c71-32f0-4741-98d4-ea8522e27500 stroke:#2c3143,stroke-width:4px;
  21["Collapse Collection"];
  20 -->|out_file1| 21;
  22["Cut"];
  21 -->|output| 22;
  cef36c68-4549-4fd6-b7c8-71fb21df012f["Output\nquality_retained_hosts_reads"];
  22 --> cef36c68-4549-4fd6-b7c8-71fb21df012f;
  style cef36c68-4549-4fd6-b7c8-71fb21df012f stroke:#2c3143,stroke-width:4px;
  23["Column join"];
  17 -->|out_file1| 23;
  22 -->|out_file1| 23;
  24["Compute"];
  23 -->|tabular_output| 24;
  25["Column Regex Find And Replace"];
  24 -->|out_file1| 25;
  470892ee-dab9-48d7-ad97-45dbd52afaa7["Output\nremoved_hosts_percentage_tabular"];
  25 --> 470892ee-dab9-48d7-ad97-45dbd52afaa7;
  style 470892ee-dab9-48d7-ad97-45dbd52afaa7 stroke:#2c3143,stroke-width:4px;
  26["MultiQC"];
  9 -->|text_file| 26;
  25 -->|out_file1| 26;
  0b1b5a73-36ee-42a2-a220-1ced6ec7378b["Output\nmultiQC_html_report_after_preprocessing"];
  26 --> 0b1b5a73-36ee-42a2-a220-1ced6ec7378b;
  style 0b1b5a73-36ee-42a2-a220-1ced6ec7378b stroke:#2c3143,stroke-width:4px;
  13cbf6c7-6954-4458-aa66-a5b020c63822["Output\nmultiQC_stats_after_preprocessing"];
  26 --> 13cbf6c7-6954-4458-aa66-a5b020c63822;
  style 13cbf6c7-6954-4458-aa66-a5b020c63822 stroke:#2c3143,stroke-width:4px;

Inputs

Input Label
Input parameter samples_profile
Input dataset collection collection_of_all_samples

Outputs

From Output Label
toolshed.g2.bx.psu.edu/repos/iuc/porechop/porechop/0.2.4+galaxy0 Porechop
toolshed.g2.bx.psu.edu/repos/iuc/nanoplot/nanoplot/1.42.0+galaxy1 NanoPlot
toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.74+galaxy0 FastQC
toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.23.4+galaxy0 fastp
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy1 MultiQC
toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.28+galaxy0 Map with minimap2
toolshed.g2.bx.psu.edu/repos/iuc/nanoplot/nanoplot/1.42.0+galaxy1 NanoPlot
toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.74+galaxy0 FastQC
toolshed.g2.bx.psu.edu/repos/iuc/bamtools_split_mapped/bamtools_split_mapped/2.5.2+galaxy2 Split BAM by reads mapping status
Grep1 Select
toolshed.g2.bx.psu.edu/repos/iuc/samtools_fastx/samtools_fastx/1.15.1+galaxy2 Samtools fastx
toolshed.g2.bx.psu.edu/repos/iuc/samtools_fastx/samtools_fastx/1.15.1+galaxy2 Samtools fastx
toolshed.g2.bx.psu.edu/repos/iuc/kraken2/kraken2/2.1.1+galaxy1 Kraken2
Cut1 Cut
toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.74+galaxy0 FastQC
toolshed.g2.bx.psu.edu/repos/iuc/krakentools_extract_kraken_reads/krakentools_extract_kraken_reads/1.2+galaxy1 Krakentools: Extract Kraken Reads By ID
Grep1 Select
Cut1 Cut
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3 Column Regex Find And Replace
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy1 MultiQC

Tools

Tool Links
Cut1
Grep1
__FILTER_FAILED_DATASETS__
toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.74+galaxy0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/bamtools_split_mapped/bamtools_split_mapped/2.5.2+galaxy2 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/collection_column_join/collection_column_join/0.0.3 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.23.4+galaxy0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/kraken2/kraken2/2.1.1+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/krakentools_extract_kraken_reads/krakentools_extract_kraken_reads/1.2+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.28+galaxy0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/nanoplot/nanoplot/1.42.0+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/porechop/porechop/0.2.4+galaxy0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/samtools_fastx/samtools_fastx/1.15.1+galaxy2 View in ToolShed
toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0 View in ToolShed

To use these workflows in Galaxy you can either click the links to download the workflows, or you can right-click and copy the link to the workflow which can be used in the Galaxy form to import workflows.

Importing into Galaxy

Below are the instructions for importing these workflows directly into your Galaxy server of choice to start using them!
Hands-on: Importing a workflow
  • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
  • Click on galaxy-upload Import at the top-right of the screen
  • Provide your workflow
    • Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”
    • Option 2: Upload the workflow file in the box labelled “Archived Workflow File”
  • Click the Import workflow button

Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

Video: Importing a workflow from URL

Version History

Version Commit Time Comments
5 cdd93376a 2024-06-06 12:00:29 adding tags to some of the workflow outputs, updating the training with the latest PathoGFAIR workflows updates
4 e230001f4 2024-05-29 11:33:18 updating preprocessing workflow and allele based workflow with a single user input parameter and adjusting the md file accodingly
3 211b69394 2024-05-26 09:45:27 adding workflow reports to the workflows of the training to match the latest version of the IWC PR
2 d320748c5 2024-05-20 18:17:48 Foodborne training update 2024
1 0e0a2f2cc 2024-01-10 15:47:09 Rename metagenomics topic to microbiome

For Admins

Installing the workflow tools

wget https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/nanopore_preprocessing.ga -O workflow.ga
workflow-to-tools -w workflow.ga -o tools.yaml
shed-tools install -g GALAXY -a API_KEY -t tools.yaml
workflow-install -g GALAXY -a API_KEY -w workflow.ga --publish-workflows