forked from geraldinepascal/FROGS-wrappers
-
Notifications
You must be signed in to change notification settings - Fork 0
/
tsv_to_biom.xml
142 lines (105 loc) · 5.3 KB
/
tsv_to_biom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
<?xml version="1.0"?>
<!--
# Copyright (C) 2016 INRA
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
-->
<tool id="FROGS_tsv_to_biom" name="FROGS TSV_to_BIOM" version="3.1">
<description>Converts a TSV file in a BIOM file.</description>
<requirements>
<requirement type="package" version="3.1.0">frogs</requirement>
</requirements>
<stdio>
<exit_code range="1:" />
<exit_code range=":-1" />
</stdio>
<command>
tsv_to_biom.py
--input-tsv $tsv_file
--output-biom $biom_file
#if $multi_affi_file
--input-multi-affi $multi_affi_file
#end if
#if $extract_fasta
--output-fasta $sequence_file
#end if
</command>
<inputs>
<!-- Files -->
<param format="tabular" name="tsv_file" type="data" label="Abundance TSV File" help="Your FROGS abundance TSV file. Take care to keep original column names." optional="false"/>
<param format="tabular" name="multi_affi_file" type="data" label="Multi_hits TSV File" help="TSV file describing multi_hit blast results." optional="true" />
<!-- Parameters -->
<param name="extract_fasta" type="boolean" label="Extract seeds in FASTA file" help="If there is a 'seed_sequence' column in your TSV table, you can extract seed sequences in a separated FASTA file." />
</inputs>
<outputs>
<data format="biom1" name="biom_file" label="${tool.name}: abundance.biom" from_work_dir="abundance.biom" />
<data format="fasta" name="sequence_file" label="${tool.name}: sequences.fasta" from_work_dir="seed.fasta" >
<filter>extract_fasta</filter>
</data>
</outputs>
<tests>
<test>
<param name="tsv_file" value="references/10-biom2tsv.tsv" />
<param name="multi_affi_file" value="references/10-biom2tsv-affiliation_multihit.tsv" />
<param name="extract_fasta" value="true"/>
<output name="biom_file" file="references/12-tsv2biom.biom" compare="sim_size" delta="0"/>
<output name="sequence_file" file="references/12-tsv2biom.fasta" />
</test>
</tests>
<help>
.. image:: static/images/frogs_images/FROGS_logo.png
:height: 144
:width: 110
.. class:: infomark page-header h2
What it does
This tool converts a TSV file in a BIOM file.
.. class:: h3
Inputs
**Abundance file**:
The table with abundances each cluster in each sample and other details conerning the cluster (format TSV).
Authorised column names : rdp_tax_and_bootstrap, blast_taxonomy, blast_subject, blast_perc_identity, blast_perc_query_coverage, blast_evalue, blast_aln_length, seed_id, seed_sequence, observation_name, observation_sum
**Multiple affiliation file**:
The file that stores the multiple blast hits.
.. class:: h3
Outputs
**Abundance file**:
The abundance of each cluster in each sample and their metadata (format `BIOM <http://biom-format.org/>`_).
**Sequence file [optional]**:
By checking the "Extract seed FASTA file" option, the sequences will be extract from TSV to create a file in `FASTA <https://en.wikipedia.org/wiki/FASTA_format>`_ format.
For this option, be sure that your TSV file contains the seed_sequence column.
.. class:: infomark page-header h2
How it works
FROGS TSV_to_BIOM detects any metadata (columns before "observation_name") and names of samples (columns after "observation_sum").
Then it reconstructs the BIOM abundance file : for each "observation_name" it adds the associated metadata and the count of samples.
If blast_taxonomy is included in metadata and if blast_subject is equal to "multi-subject", it parses multi_hit TSV file. Then it extracts the list of blast_affiliations that contains the non ambiguous blast_taxonomy.
.. class:: infomark page-header h2
Advices
This tool is usefull if you have modified your abundance TSV file and that you want to generate rarefaction curve or sunburst with the FROGS affiliation_stat tool.
If you modify your abundance TSV file
* -do not modify column names
* -do not remove columns
* -take care to choose a taxonomy available in your multi_hit TSV file
* -if you delete lines of the multi_hit file, take care to not remove a complete cluster whithout removing all "multi tags" in you abundance TSV file.
* -if you want to rename a taxon level (ex : genus "Ruminiclostridium 5;" to genus "Ruminiclostridium;"), do not forget to modify also your multi_hit TSV file.
----
**Contact**
Contacts: [email protected]
Repository: https://github.com/geraldinepascal/FROGS
website: http://frogs.toulouse.inra.fr/
Please cite the **FROGS article**: *Escudie F., et al. Bioinformatics, 2018. FROGS: Find, Rapidly, OTUs with Galaxy Solution.*
</help>
<citations>
<citation type="doi">10.1093/bioinformatics/btx791</citation>
</citations>
</tool>