-
Notifications
You must be signed in to change notification settings - Fork 39
The all subcommand
You can provide this subcommand with single-end or paired-end fastq data (raw data or clean data), MitoZ will try to give you annotated mitogenomes directly.
$ mitoz all -h
usage: mitoz all [-h] [--outprefix <str>] [--thread_number <int>] [--workdir <directory>]
[--clade {Chordata,Arthropoda,Echinodermata,Annelida-segmented-worms,Bryozoa,Mollusca,Nematoda,Nemertea-ribbon-worms,Porifera-sponges}]
[--genetic_code <INT>] [--species_name <STR>] [--template_sbt <file>] --fq1 <file>
[--fq2 <file>] [--phred64] [--insert_size <INT>] [--fastq_read_length <INT>]
[--data_size_for_mt_assembly <float1>,<float2>] [--skip_filter] [--filter_other_para <str>]
[--assembler {mitoassemble,spades,megahit}] [--tmp_dir <STR>] [--kmers <INT> [<INT> ...]]
[--kmers_megahit <INT> [<INT> ...]] [--kmers_spades <INT> [<INT> ...]] [--memory <INT>]
[--resume_assembly] [--profiles_dir <STR>] [--slow_search] [--filter_by_taxa] --requiring_taxa
<STR> [--requiring_relax {0,1,2,3,4,5,6}] [--min_abundance <float>]
Run all steps for mitochondrial genome anlysis from input fastq files.
optional arguments:
-h, --help show this help message and exit
Common arguments:
--outprefix <str> output prefix [out]
--thread_number <int>
thread number [8]
--workdir <directory>
working directory [./]
--clade {Chordata,Arthropoda,Echinodermata,Annelida-segmented-worms,Bryozoa,Mollusca,Nematoda,Nemertea-ribbon-worms,Porifera-sponges}
which clade does your species belong to? [Arthropoda]
--genetic_code <INT> which genetic code table to use? 'auto' means determined by '--clade' option. [auto]
--species_name <STR> species name to use in output genbank file ['Test sp.']
--template_sbt <file>
The sqn template to generate the resulting genbank file. Go to
https://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/#Template to generate your own template
file if you like.
['/home/gmeng/.conda/envs/mybase/envs/mitozEnv.test3.6/lib/python3.8/site-
packages/mitoz/annotate/script/template.sbt']
Input fastq information:
--fq1 <file> Fastq1 file [required]
--fq2 <file> Fastq2 file [optional]
--phred64 Are the fastq phred64 encoded? [False]
--insert_size <INT> insert size of input fastq files [250]
--fastq_read_length <INT>
read length of fastq reads, used by the filter subcommand and mitoAssemble. [150]
--data_size_for_mt_assembly <float1>,<float2>
Data size (Gbp) used for mitochondrial genome assembly, usually between 2~8 Gbp is
enough. The float1 means the size (Gbp) of raw data to be subsampled, while the float2
means the size of clean data must be >= float2 Gbp, otherwise MitoZ will STOP running!
When only float1 is set, float2 is assumed to be 0. (1) Set float1 to be 0 if you want
to use ALL raw data; (2) Set 0,0 if you want to use ALL raw data and do NOT interrupt
MitoZ even if you got very little clean data. If you got missing mitochondrial genes,
try (1) differnt kmers; (2)different assembler; (3) increase <float1>,<float2> [2,0]
--skip_filter Skip the rawdata filtering step, assuming input fastq are clean data. To subsample such
clean data, set <float2> of the --data_size_for_mt_assembly option to be larger than 0
(using all input clean data by default). [False]
--filter_other_para <str>
other parameter for filtering. []
Assembly arguments:
--assembler {mitoassemble,spades,megahit}
Assembler to be used. [megahit]
--tmp_dir <STR> Set temp directory for megahit if necessary (See
https://github.com/linzhi2013/MitoZ/issues/176)
--kmers <INT> [<INT> ...]
kmer size(s) to be used. Multiple kmers can be used, separated by space [71]
--kmers_megahit <INT> [<INT> ...]
kmer size(s) to be used. Multiple kmers can be used, separated by space. Only for
megahit [43 71 99]
--kmers_spades <INT> [<INT> ...]
kmer size(s) to be used. Multiple kmers can be used, separated by space. Only for spades
['auto']
--memory <INT> memory size limit for spades/megahit, no enough memory will make the two programs halt
or exit [50]
--resume_assembly to resume previous assembly running [False]
Search mitochondrial sequences arguments:
--profiles_dir <STR> Directory cotaining 'CDS_HMM/', 'MT_database/' and 'rRNA_CM/'.
[/home/gmeng/.conda/envs/mybase/envs/mitozEnv.test3.6/lib/python3.8/site-
packages/mitoz/profiles]
--slow_search By default, we firstly use tiara to perform quick sequence classification (100 times
faster than usual!), however, it is valid only when your mitochondrial sequences are >=
3000 bp. If you have missing genes, set '--slow_search' to use the tradicitiona search
mode. [False]
--filter_by_taxa filter out non-requiring_taxa sequences by mito-PCGs annotation to do taxa
assignment.[True]
--requiring_taxa <STR>
filtering out non-requiring taxa sequences which may be contamination [required]
--requiring_relax {0,1,2,3,4,5,6}
The relaxing threshold for filtering non-target-requiring_taxa. The larger digital means
more relaxing. [0]
--min_abundance <float>
the minimum abundance of sequence required. Set this to any value <= 0 if you do NOT
want to filter sequences by abundance [10]
Now MitoZ uses three de novo assemblers, MitoAssemble, Megahit, and SPAdes. The users are encouraged to test different assemblers when one of the assemblers fails to deliver a good mitogenome. If your server does not have enough memory, you can try to set the --memory
option and use Megahit or SPAdes for assembly. For example, --memory 50
means limiting the assembler to use a maximum of 50 GB RAM.
To specify a specific assembler, use the --assembler
option.
Warning:
-
--assembler spades
only accepts paired-end data, which means that you need to provide both--fq1
and--fq2
! -
Use
--data_size_for_mt_assembly 0
if you want to use ALL your fastq data for mitogenome assembly (no matter which assembler you are going to use)!!
Firstly, create a directory for the analysis of your sample (it is better if each sample has its own directory):
mkdir -p /home/gmeng/work/sampleID # change this path to your own working path
cd /home/gmeng/work/sampleID
- PE data works with all three assemblers (
--assembler megahit
,--assembler spades
and--assembler mitoassemble
)
source activate mitozEnv
fq1=/path/to/read.1.fq.gz
fq2=/path/to/read.2.fq.gz
out=YourSampleID
mitoz all \
--outprefix $out \
--clade Chordata \
--requiring_taxa Chordata \
--genetic_code 2 \
--fq1 $fq1 \
--fq2 $fq2 \
--assembler megahit
Or use --skip_filter
if you want to skip the raw data filter step (assuming your data is already clean data):
source activate mitozEnv
fq1=/path/to/read.1.fq.gz
fq2=/path/to/read.2.fq.gz
out=YourSampleID
mitoz all \
--outprefix $out \
--clade Chordata \
--requiring_taxa Chordata \
--genetic_code 2 \
--fq1 $fq1 \
--fq2 $fq2 \
--assembler megahit \
--skip_filter
Or if you want to limit the resource the software going to use (--memory
works with --assembler megahit
and --assembler spades
only):
source activate mitozEnv
fq1=/path/to/read.1.fq.gz
fq2=/path/to/read.2.fq.gz
out=YourSampleID
mitoz all \
--outprefix $out \
--clade Chordata \
--requiring_taxa Chordata \
--genetic_code 2 \
--fq1 $fq1 \
--fq2 $fq2 \
--assembler megahit \
--memory 80
By default, MitoZ only extracts 5 Gbp clean data for mitogenome assembly. To force MitoZ to use all your input fastq data for assembly, use the --data_size_for_mt_assembly 0
option:
source activate mitozEnv
fq1=/path/to/read.1.fq.gz
fq2=/path/to/read.2.fq.gz
out=YourSampleID
mitoz all \
--outprefix $out \
--clade Chordata \
--requiring_taxa Chordata \
--genetic_code 2 \
--fq1 $fq1 \
--fq2 $fq2 \
--assembler megahit \
--data_size_for_mt_assembly 0
- SE data does not work with the
--assembler spades
option.
Use mitoassemble for assembly:
source activate mitozEnv
fq1=/path/to/read.1.fq.gz
out=YourSampleID
mitoz all \
--outprefix $out \
--clade Chordata \
--requiring_taxa Chordata \
--genetic_code 2 \
--fq1 $fq1 \
--assembler mitoassemble \
--fastq_read_length 151 \
--kmers 91 71 51
Or use spades for assembly:
source activate mitozEnv
fq1=/path/to/read.1.fq.gz
out=YourSampleID
mitoz all \
--outprefix $out \
--clade Chordata \
--requiring_taxa Chordata \
--genetic_code 2 \
--fq1 $fq1 \
--assembler megahit
source activate mitozEnv
fq1=/path/to/read.1.fq.gz
fq2=/path/to/read.2.fq.gz
out=YourSampleID
mitoz all \
--outprefix $out \
--clade Chordata \
--requiring_taxa Chordata \
--genetic_code 2 \
--fq1 $fq1 \
--fq2 $fq2 \
--assembler mitoassemble \
--fastq_read_length 151 \
--kmers 91 71 51
Please change the --fastq_read_length 151
and --clade Chordata
and --requiring_taxa
and --genetic_code 2
according to your fastq files and samples. You can also try any other different kmer sizes (odd numbers), say 65
, or using more different kmers, say --kmers 91 65 71 51
The above will run mitochondrial genome assembly using kmer 91, 71, and 51 separately using the mitoAssemble assembler and 8 threads by default.
If you do not want to filter your input fastq files, add the --skip_filter
option:
source activate mitozEnv
fq1=/path/to/read.1.fq.gz
fq2=/path/to/read.2.fq.gz
out=YourSampleID
mitoz all \
--outprefix $out \
--clade Chordata \
--requiring_taxa Chordata \
--genetic_code 2 \
--fq1 $fq1 \
--fq2 $fq2 \
--assembler mitoassemble \
--fastq_read_length 151 \
--kmers 91 71 51 \
--skip_filter
Please keep in mind that, each kmer assembly is quite time-consuming. If the previous kmer assembly already gets a circular mitochondrial genome, you do not have to run the remained kmer assembly, which means that you can kill the job at this point, and then annotate
the mitochondrial genome directly.
You can go to check the /home/gmeng/work/sampleID/mt_assembly/
directory, and check the sampleID.mitoAssemble.K*.result
directories.
But of course, you can also let the above command run until it finishes, some kmer assemblies could give better results.
About:
Commands:
- The -all- subcommand
- The -filter- subcommand
- The -assemble- subcommand
- The -findmitoscaf- subcommand
- The -annotate- subcommand
- The -visualize- subcommand
Usages:
- Installation
- Tutorial
- Extending MitoZ-s database
- Batch processing of many samples
- Known issues
- FAQ
- Some important intermediate files
- Upload to GenBank
MitoZ-tools:
- Overview: The -mitoz tools- command
- The -mitoz-tools--group_seq_by_gene- command
- The -mitoz tools bold_identification- command
- The -mitoz tools circle_check- command
- The -mitoz tools gbfiletool- command
- The -mitoz tools gbseqextractor- command
- The -mitoz tools msaconverter- command
- The -mitoz tools taxonomy_ranks- command