tsv_to_biom.xml

<?xml version="1.0"?>
<!--
# Copyright (C) 2016 INRA
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
-->
<tool id="FROGS_tsv_to_biom" name="FROGS TSV_to_BIOM" version="3.1">
	<description>Converts a TSV file in a BIOM file.</description>
        <requirements>
                <requirement type="package" version="3.1.0">frogs</requirement>
        </requirements>
        <stdio> 
                <exit_code range="1:" />
                <exit_code range=":-1" />
        </stdio>
	<command>
		tsv_to_biom.py 
		                --input-tsv $tsv_file
		                --output-biom $biom_file
		              #if $multi_affi_file
		                --input-multi-affi $multi_affi_file
		              #end if
		              #if $extract_fasta
		                --output-fasta $sequence_file
		              #end if
	</command>
	<inputs>
		<!-- Files -->
		<param format="tabular" name="tsv_file" type="data" label="Abundance TSV File" help="Your FROGS abundance TSV file. Take care to keep original column names." optional="false"/>
		<param format="tabular" name="multi_affi_file" type="data" label="Multi_hits TSV File" help="TSV file describing multi_hit blast results." optional="true" />
		<!-- Parameters -->
		<param name="extract_fasta" type="boolean" label="Extract seeds in FASTA file" help="If there is a 'seed_sequence' column in your TSV table, you can extract seed sequences in a separated FASTA file." />
	</inputs>
	<outputs>
		<data format="biom1" name="biom_file" label="${tool.name}: abundance.biom" from_work_dir="abundance.biom" />
		<data format="fasta" name="sequence_file" label="${tool.name}: sequences.fasta" from_work_dir="seed.fasta" >
			<filter>extract_fasta</filter>
		</data>
	</outputs>
	<tests>
		<test>
			<param name="tsv_file" value="references/10-biom2tsv.tsv" />
			<param name="multi_affi_file" value="references/10-biom2tsv-affiliation_multihit.tsv" />
			<param name="extract_fasta" value="true"/>
			<output name="biom_file" file="references/12-tsv2biom.biom" compare="sim_size" delta="0"/>
			<output name="sequence_file" file="references/12-tsv2biom.fasta" />
		</test>
	</tests>
	<help>
.. image:: static/images/frogs_images/FROGS_logo.png
   :height: 144
   :width: 110


.. class:: infomark page-header h2

What it does

This tool converts a TSV file in a BIOM file.

.. class:: h3

Inputs

**Abundance file**:
 
The table with abundances each cluster in each sample and other details conerning the cluster (format TSV).

Authorised column names :  rdp_tax_and_bootstrap, blast_taxonomy, blast_subject, blast_perc_identity, blast_perc_query_coverage, blast_evalue, blast_aln_length, seed_id, seed_sequence, observation_name, observation_sum

**Multiple affiliation file**:

The file that stores the multiple blast hits.

.. class:: h3

Outputs

**Abundance file**:

 The abundance of each cluster in each sample and their metadata (format `BIOM &lt;http://biom-format.org/&gt;`_).

**Sequence file [optional]**:

 By checking the "Extract seed FASTA file" option, the sequences will be extract from TSV to create a file in `FASTA &lt;https://en.wikipedia.org/wiki/FASTA_format&gt;`_ format.
 For this option, be sure that your TSV file contains the seed_sequence column.


.. class:: infomark page-header h2

How it works

FROGS TSV_to_BIOM detects any metadata (columns before "observation_name") and names of samples (columns after "observation_sum").

Then it reconstructs the BIOM abundance file : for each "observation_name" it adds the associated metadata and the count of samples.

If blast_taxonomy is included in metadata and if blast_subject is equal to "multi-subject", it parses multi_hit TSV file. Then it extracts the list of blast_affiliations that contains the non ambiguous blast_taxonomy.


.. class:: infomark page-header h2

Advices

This tool is usefull if you have modified your abundance TSV file and  that you want to generate rarefaction curve or sunburst with the FROGS affiliation_stat tool.

If you modify your abundance TSV file

    * -do not modify column names
    * -do not remove columns
    * -take care to choose a taxonomy available in your multi_hit TSV file
    * -if you delete lines of the multi_hit file, take care to not remove a complete cluster whithout removing all "multi tags" in you abundance TSV file. 
    * -if you want to rename a taxon level (ex : genus "Ruminiclostridium 5;" to genus "Ruminiclostridium;"), do not forget to modify also your multi_hit TSV file.

----

**Contact**

Contacts: frogs@inra.fr

Repository: https://github.com/geraldinepascal/FROGS
website: http://frogs.toulouse.inra.fr/

Please cite the **FROGS article**: *Escudie F., et al. Bioinformatics, 2018. FROGS: Find, Rapidly, OTUs with Galaxy Solution.*

	</help>
	<citations>
		<citation type="doi">10.1093/bioinformatics/btx791</citation>
	</citations>

</tool>