forked from geraldinepascal/FROGS-wrappers
-
Notifications
You must be signed in to change notification settings - Fork 0
/
affiliation_OTU.xml
198 lines (141 loc) · 8.35 KB
/
affiliation_OTU.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
<?xml version="1.0"?>
<!--
# Copyright (C) 2015 INRA
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
-->
<tool id="FROGS_affiliation_OTU" name="FROGS Affiliation OTU" version="3.1">
<description>Taxonomic affiliation of each OTU's seed by RDPtools and BLAST</description>
<requirements>
<requirement type="package" version="3.1.0">frogs</requirement>
</requirements>
<stdio>
<exit_code range="1:" />
<exit_code range=":-1" />
</stdio>
<command>
#set $reference_filename = str( $ref_file.fields.path )
affiliation_OTU.py
--reference "${reference_filename}"
--input-biom $biom_abundance
--input-fasta $fasta_sequences
--output-biom $biom_affiliation
--summary $summary
--nb-cpus \${GALAXY_SLOTS:-1}
--java-mem $mem
#if $rdp
--rdp
#end if
</command>
<inputs>
<!-- JOB Parameter -->
<param name="mem" type="hidden" label="Memory allocation" help="The number of Go to allocation for java" value="20"></param>
<!-- Database Choice -->
<param name="ref_file" type="select" label="Using reference database" help="Select reference from the list">
<options from_data_table="frogs_db"></options>
<validator type="no_options" message="A built-in database is not available" />
</param>
<param name="rdp" type="boolean" label="Also perform RDP assignation?" help="Taxonomy affiliation will be perform thanks to Blast. This option allow you to perform it also with RDP classifier (default No)" />
<!-- Files -->
<param format="fasta" name="fasta_sequences" type="data" label="OTU seed sequence" help="OTU sequences (format: fasta)."/>
<param format="biom1" name="biom_abundance" type="data" label="Abundance file" help="OTU abundances (format: BIOM)."/>
</inputs>
<outputs>
<data format="biom1" name="biom_affiliation" label="${tool.name}: affiliation.biom" from_work_dir="affiliation.biom" />
<data format="html" name="summary" label="${tool.name}: report.html" from_work_dir="report.html"/>
</outputs>
<tests>
<test>
<param name="ref_file" value="ITS1_test"/>
<param name="fasta_sequences" value="references/04-filters.fasta"/>
<param name="biom_abundance" value="references/04-filters.biom"/>
<!-- <param name="fasta_sequences" value="swarm.fasta"/>
<param name="biom_abundance" value="swarm.biom"/> -->
<output name="biom_affiliation" file="references/06-affiliation.biom"/>
<output name="summary" file="references/06-affiliation.html" compare="sim_size" delta="0" />
</test>
</tests>
<help>
.. image:: static/images/frogs_images/FROGS_logo.png
:height: 144
:width: 110
.. class:: infomark page-header h2
What it does
Add taxonomic affiliation in abundance file.
.. class:: infomark page-header h2
Inputs/outputs
.. class:: h3
Inputs
**Sequence file**:
The sequences (format `FASTA <https://en.wikipedia.org/wiki/FASTA_format>`_).
**Abundance file**:
The abundance of each OTU in each sample (format `BIOM <http://biom-format.org/>`_).
.. class:: h3
Outputs
**Abundance file** (tax_affiliation.biom):
The abundance file with affiliation (format `BIOM <http://biom-format.org/>`_).
**Summary file** (report.html):
This file presents the number of sequences affiliated by blast, and the number of multi-affiliation (format `HTML <https://en.wikipedia.org/wiki/HTML>`_).
.. image:: static/images/frogs_images/FROGS_affiliation_summary.png
:height: 800
:width: 600
.. class:: infomark page-header h2
Reference database
All the databases we format (on demand) for RDPClassifier and NCBI Blast+ are inventoried here: http://genoweb.toulouse.inra.fr/frogs_databanks/assignation/readme.txt
.. class:: infomark page-header h2
How it works
.. csv-table::
:header: "Steps", "Description"
:widths: 5, 150
:class: table table-striped
"1", "`RDPClassifier <https://rdp.cme.msu.edu/tutorials/classifier/RDPtutorial_RDP_classifier.html>`_ may be used with database to associate to each OTU a taxonomy and a bootstrap (example: *Bacteria;(1.0);Firmicutes;(1.0);Clostridia;(1.0);Clostridiales;(1.0);Clostridiaceae 1;(1.0);Clostridium sensu stricto;(1.0);*)."
"2", "`blastn+ <http://blast.ncbi.nlm.nih.gov/>`_ or `needlall <http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/needleall.html>`_ is used to find alignment between each OTU and the database. Only the bests hits with the same score are reported. blastn+ is used for merged read pair, and needall is used for artificially combined sequence. For each alignment returned, several metrics are computed: identity percentage, coverage percentage, and alignment length"
"3", "For each OTU with several blastn+/needlall alignment results a consensus is determined on each taxonomic level. If all the taxa in a taxonomic rank are identical the taxon name is reported otherwise *Multi-affiliation* is reported. By example, if you have an OTU with two corresponding sequences, the first is a *Bacteria;Proteobacteria;Gamma Proteobacteria;Enterobacteriales*, the second is a *Bacteria;Proteobacteria;Beta Proteobacteria;Methylophilales*, the consensus will be *Bacteria;Proteobacteria;Multi-affiliation;Multi-affiliation*."
.. class:: infomark page-header h2
Alignment metrics details on identity percentage calculation
With classicala %id computation method, we will obtain for overlapped amplicon sequence and for artificially combined amplicon sequence :
* **Case 1: a sequencing of overlapping sequences i.e. 16S V3-V4 amplicon MiSeq sequencing**
.. image:: static/images/frogs_images/FROGS_affiliation_overlapped_percent_id.png
:height: 325
:width: 807
* **Case 2 : a sequencing of non-overlapping sequences: case of ITS1 amplicon MiSeq sequencing**
.. image:: static/images/frogs_images/FROGS_affiliation_combined_percent_id.png
:height: 310
:width: 887
* **Finally, how percentage identity is computed ?**
With the classical method of %id calculation, filtering on %id will systematically removed “FROGS combined” OTUs. So, we proposed to replace the classical %id by a %id computed on the sequenced bases only.
.. image:: static/images/frogs_images/FROGS_affiliation_percent_id_formula.png
:height: 36
:width: 637
For the precedent use cases we will obtain:
* Case 1: 16S V3V4 overlapped sequence
% sequenced bases identity = 400 matches / 400 bp = 100%
* Case 2: very large ITS1 “FROGS combined” shorter than the real sequence
% sequenced bases identity = (250 + 250 ) / (600 - 100) = 100%
This calculation allows to return 100% of identity on sequenced bases for “FROGS combined” shorter or longer than reality in case of perfect sequencing, and a smaller percentage of identity in the case of small overlap repeat kept in FROGS combined sequence.
.. class:: infomark page-header h2
Advices
This tool can take large time. It is recommended to filter your abundance and your sequence file before this tool (see **FROGS Filters**).
As you can see the affiliation of each OTU is not human readable in outputed abundance file. We provide a tools to convert these BIOM file in tabulated file, see the **FROGS BIOM to TSV** tool.
----
**Contact**
Contacts: [email protected]
Repository: https://github.com/geraldinepascal/FROGS
website: http://frogs.toulouse.inra.fr/
Please cite the **FROGS article**: *Escudie F., et al. Bioinformatics, 2018. FROGS: Find, Rapidly, OTUs with Galaxy Solution.*
</help>
<citations>
<citation type="doi">10.1093/bioinformatics/btx791</citation>
</citations>
</tool>