-
Notifications
You must be signed in to change notification settings - Fork 39
Known issues
How to check:
Have a look at the file *.mitoAssemble.K*.overlap_information
, it may have something like this:
>C3252271 overlap between 5' and 3' are 52bp
TGAACGGAATAGTTGGTAATTAGTTTAATCAAAACAAATGATTTCGACTCA
and check the file *.mitoAssemble.K*.mitogenome.fa
:
$ head -1 *.mitoAssemble.K*.mitogenome.fa
>C3252271 topology=circular
Because this overlapping region is quite long and not simple repeats (say AAAAAAAA
), (in most cases) we can safely say that we have got a circular mitochondrial genome (there are methods to verify this, see XXX).
To do:
- Use another Circularity check script (I remember I have written one?) which can be aware of the simple repeats
Say we have this file *.mitoAssemble.K51.overlap_information
:
>C3892882 overlap between 5' and 3' are 9bp
TTTTTTTT
and we have this file *.mitoAssemble.K51.mitogenome.fa
:
>C3892882 topology=circular
In this case, it is dubious that the assembled mt genome is circular. In this case, you can just treat the sequence as linear, or try to assemble the mt genome with different kmers (larger or smaller), which might be able to overcome the simple repeat problem. This depends on the goal of your study, e.g., if you are going to use only the PCGs for subsequent analysis and you have got all PCGs already, then a circular (complete) mt genome does not help. The changing breakpoint method mentioned 3. Breakpoint and incomplete genes should also help.
The summary.txt
file:
#Seq_id Length(bp) Circularity Closely_related_species
C3252271 16597 yes Onychomys leucogaster
#Seq_id Start End Length(bp) Direction Type Gene_name Gene_prodcut Total_freq_occurred
--------------------------------------------------------------------------------------------------------------------------
C3252271 14 82 69 + tRNA trnR(ucg) tRNA-Arg 1
C3252271 84 381 298 + CDS ND4L NADH dehydrogenase subunit 4L 1
.
.
.
C3252271 14568 14772 205 + CDS ATP8 ATP synthase F0 subunit 8 1
C3252271 14729 15410 682 + CDS ATP6 ATP synthase F0 subunit 6 1
C3252271 16193 16262 70 + tRNA trnG(ucc) tRNA-Gly 1
C3252271 16262 >16598 337 + CDS ND3 NADH dehydrogenase subunit 3 1
Here, the ND3 cannot find its stop codon, but because this sequence is actually circular already, the stop codon of ND3 is at the 5' end of this sequence.
In this case, you can simply manually change the breakpoint of the mitochondrial genome. But be careful, you should find a breakpoint where there is no gene (especially overlapping regions of different genes), for example, here sites between 14772
and 14729
, or the site 16262
are not good positions. Instead, Any positions between 15410
and 16193
can be chosen.
After this, you can re-annotate the sequence using the mitoz annotate
command, by also providing the fastq files (--fq1
and/or --fq2
options), MitoZ will calculate the sequencing depth of all sites along the mitogenome, which in turn provides more evidence to show if the mitogenome is really complete or not (check the abundance track on the circos.svg
and cirvos.png
files) --- if the sequencing depth (abundance) around the original breakpoint is normally high like other sites of the mitogenome, not sudden dropping or increasing a lot (which indicates repeats), then the mitogenome is complete.
During testing, when I submitted the job to SGE for running, sometimes it generated a core dump file (e.g. core.81923
) in the annotation step, I do not know the exact reason yet. But when I locally re-ran the command, the problem resolved. Complicated Conda environments can also induce problems, e.g. I once used a conda
installed by another person to create the mitozEnv
environment, and I always got missing PCGs in the annotation step.
Do not know why sometimes even circos --modules
shows every required Perl module is installed, MitoZ still fails to run Circos (thus no circos.svg
and circos.png
files), which results in something like this in the stderr output:
2022-06-01 17:18:53,822 - mitoz.utility.utility - INFO -
cp -r /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.png /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.svg /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.depth.txt /home/gmeng/test/sing/mt_annotation/ttt.ttt.megahit.mitogenome.fa.result
cp: der Aufruf von stat für „/home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.png“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden
cp: der Aufruf von stat für „/home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.svg“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden
2022-06-01 17:18:53,842 - mitoz.utility.utility - ERROR -
Error occured when running command:
cp -r /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.png /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.svg /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.depth.txt /home/gmeng/test/sing/mt_annotation/ttt.ttt.megahit.mitogenome.fa.result
Solution:
For each specimen, find the mt_annotation
directory:
$ source activate mitozEnv # or use "mamba" or "conda" instead of "source" the command here.
$ ls mt_annotation/tmp_*_mitoscaf.fa/mt_visualization/ -d | while read f ; do cd $f ; circos ; cd ../../../ ; done
# to list the resulting files:
$ ls mt_annotation/tmp_*_mitoscaf.fa/mt_visualization/circos.{svg,png}
If you have multiple specimens within the same directory, for example, /my/project/SampleID_1
, /my/project/SampleID_2
,
$ cd /my/project/ # go to the project directory containing the assembly directory of each specimen
$ source activate mitozEnv # or use "mamba" or "conda" instead of "source" the command here.
$ ls */mt_annotation/tmp_*_mitoscaf.fa/mt_visualization/ -d | while read f ; do cd $f ; circos ; cd ../../../../ ; done
# the '*' here will match your sample IDs
# to list the resulting files:
$ ls */mt_annotation/tmp_*_mitoscaf.fa/mt_visualization/circos.{svg,png}
updates:.
-
The Singularity version (MitoZ version 3.3) seems good to me, indicating the above problem is probably because my environmental variables on the cluster are somehow complicated (and I do not know the exact reason), if you have the same problem, please try to install
mitozEnv
into a clean environment or try the Singularity version. -
empty depth file
If the XXX.result/XXX.XXX.megahit.mitogenome.fa.result/circos.depth.txt
file is empty, it is normal for Circos not to run properly.
To specify a specific assembler, use the --assembler
option.
Warning: --assembler megahit
only accepts paired-end data, which means that you need to provide both --fq1
and --fq2
!
This can happen if:
- typo for the value of the
--requiring_taxa
option. - Your etetoolkit database is broken. Please re-install it by checking https://github.com/linzhi2013/MitoZ/wiki/Installation#3-the-etetoolkit-database.
About:
Commands:
- The -all- subcommand
- The -filter- subcommand
- The -assemble- subcommand
- The -findmitoscaf- subcommand
- The -annotate- subcommand
- The -visualize- subcommand
Usages:
- Installation
- Tutorial
- Extending MitoZ-s database
- Batch processing of many samples
- Known issues
- FAQ
- Some important intermediate files
- Upload to GenBank
MitoZ-tools:
- Overview: The -mitoz tools- command
- The -mitoz-tools--group_seq_by_gene- command
- The -mitoz tools bold_identification- command
- The -mitoz tools circle_check- command
- The -mitoz tools gbfiletool- command
- The -mitoz tools gbseqextractor- command
- The -mitoz tools msaconverter- command
- The -mitoz tools taxonomy_ranks- command