-
Notifications
You must be signed in to change notification settings - Fork 39
New Features
Now we use the Fastp (https://github.com/OpenGene/fastp) program to raw data filter, which is much faster.
By default, MitoZ only uses a subset (5 Gbp) of raw fastq data for mitogenome assembly. You can change the portion of data to be used via the --data_size_for_mt_assembly
option, or set --data_size_for_mt_assembly 0
to tell MitoZ to use all raw fastq data for assembly.
In MitoZ >=3.5, we use the --data_size_for_mt_assembly <float1>,<float2>
style, which can subsample the raw data (but not clean data!). The float1 means the size (Gbp) of raw data to be subsampled, while the float2 means the size of clean data must be >= float2 Gbp, otherwise MitoZ will STOP running! When only float1 is set, float2 is assumed to be 0
.
(1) Set float1 to be 0
if you want to use ALL raw data;
(2) Set 0,0
if you want to use ALL raw data and do NOT interrupt MitoZ even if you got very little clean data.
Don't forget to cite the program when you use this function:
- Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560
In MitoZ 3.5, if you want to subsample your input clean data, try --skip_filter
and --data_size_for_mt_assembly <float1>,<float2>
at the same time. For example, --skip_filter --data_size_for_mt_assembly 0,5
will extract 5Gbp of the input clean data.
Now we include Megahit (https://github.com/voutcn/megahit) and SPAdes (https://github.com/ablab/spades) for mitogenome assembly. These two programs try multiple kmers during one assembly run, which might sometimes achieve better results when the MitoAssemble doesn't, or versus. See warnings here: https://github.com/linzhi2013/MitoZ/wiki/Known-issues#8-megahit-gets-very-long-sequences.
-
With the two new de novo assemblers and the multi-kmer mode for MitoAssemble (see below), it is also more possible to achieve better mitogenome assembly results from UCE/target-enrichment/hybrid-enrichment/transcriptome/etc data (https://github.com/linzhi2013/MitoZ/wiki/FAQ#can-i-use-mitoz-for-mitogenome-assembly-based-on-ucetarget-enrichmenttranscriptome-data)
-
Multiple-species pooling dataset? See https://github.com/linzhi2013/MitoZ/wiki/FAQ#can-i-apply-mitoz-to-metagenomic-multiple-species-dataset
-
The input data size of fastq files to MitoZ can now be larger than that of the previous versions of MitoZ (in case you want to use larger dataset for mitogenome assembly).
The two programs also support limiting the RAM usage via the --memory
option, this is useful when the users' servers do not have enough memory. But remember that, if this value is too small (especially when you use a big --thread_number
, e.g. 24
), the two programs may fail to run.
To specify a specific assembler, use the --assembler
option.
===Warning: --assembler spades
only accepts paired-end data, which means that you need to provide both --fq1
and --fq2
, and they must be paired!===
If your fq1 and fq2 are not properly paired, you may get errors like this https://github.com/linzhi2013/MitoZ/issues/193, https://github.com/ablab/spades/issues/420, and https://www.biostars.org/p/311603/ and https://www.biostars.org/p/9514582/ .
The solution could be using the https://github.com/linsalrob/fastq-pair tool to correct your fastq files.
Don't forget to cite them when you use them for mitogenome assembly:
-
Li, Dinghua, et al. "MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph." Bioinformatics 31.10 (2015): 1674-1676. https://doi.org/10.1093/bioinformatics/btv033
-
Nurk, Sergey, et al. "metaSPAdes: a new versatile metagenomic assembler." Genome research 27.5 (2017): 824-834. http://www.genome.org/cgi/doi/10.1101/gr.213959.116.
See also https://github.com/linzhi2013/MitoZ/wiki/Tutorial.
You can now set multiple kmers for the MitoAssemble program, for example, --kmers 51 71 91
. Depending on your data, some kmers might achieve better results than others sometimes. In this case, MitoZ simply performs independent runs of MitoAssemble with different kmers.
Sometimes, MitoZ fails to annotate some protein-coding genes (e.g. ATP8 is very divergent for some clades), mainly due to a lack of a more closely related annotation database. When this is the case, users now can easily customize the annotation database for their own samples. Please refer to https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ's-database for more details.
--template_sbt <file>
The sqn template to generate the resulting genbank file. Go to
https://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/#Template to generate your own template
file if you like. ['/home/gmeng/dev/MitoZ_private/mitoz/annotate/script/template.sbt']
About:
Commands:
- The -all- subcommand
- The -filter- subcommand
- The -assemble- subcommand
- The -findmitoscaf- subcommand
- The -annotate- subcommand
- The -visualize- subcommand
Usages:
- Installation
- Tutorial
- Extending MitoZ-s database
- Batch processing of many samples
- Known issues
- FAQ
- Some important intermediate files
- Upload to GenBank
MitoZ-tools:
- Overview: The -mitoz tools- command
- The -mitoz-tools--group_seq_by_gene- command
- The -mitoz tools bold_identification- command
- The -mitoz tools circle_check- command
- The -mitoz tools gbfiletool- command
- The -mitoz tools gbseqextractor- command
- The -mitoz tools msaconverter- command
- The -mitoz tools taxonomy_ranks- command