Skip to content

Known issues

Guanliang MENG edited this page Jun 8, 2022 · 34 revisions

1. Circularity problem and its solution

How to check:

Have a look at the file *.mitoAssemble.K*.overlap_information, it may have something like this:

>C3252271 overlap between 5' and 3' are 52bp
TGAACGGAATAGTTGGTAATTAGTTTAATCAAAACAAATGATTTCGACTCA

and check the file *.mitoAssemble.K*.mitogenome.fa:

$ head -1  *.mitoAssemble.K*.mitogenome.fa
>C3252271 topology=circular

Because this overlapping region is quite long and not simple repeats (say AAAAAAAA), (in most cases) we can safely say that we have got a circular mitochondrial genome (there are methods to verify this, see XXX).

To do:

  • Use another Circularity check script (I remember I have written one?) which can be aware of the simple repeats

2. Simple repeats found in the overlapping region

Say we have this file *.mitoAssemble.K51.overlap_information:

>C3892882 overlap between 5' and 3' are 9bp
TTTTTTTT

and we have this file *.mitoAssemble.K51.mitogenome.fa:

>C3892882 topology=circular

In this case, it is dubious that the assembled mt genome is circular. In this case, you can just treat the sequence as linear, or try to assemble the mt genome with different kmers (larger or smaller), which might be able to overcome the simple repeat problem. This depends on the goal of your study, e.g., if you are going to use only the PCGs for subsequent analysis and you have got all PCGs already, then a circular (complete) mt genome does not help. The changing breakpoint method mentioned 3. Breakpoint and incomplete genes should also help.

3. Breakpoint and incomplete genes

The summary.txt file:

#Seq_id        Length(bp)     Circularity    Closely_related_species
C3252271       16597          yes             Onychomys leucogaster

#Seq_id        Start  End    Length(bp) Direction  Type   Gene_name  Gene_prodcut                      Total_freq_occurred
--------------------------------------------------------------------------------------------------------------------------
C3252271       14     82     69         +          tRNA   trnR(ucg)  tRNA-Arg                          1
C3252271       84     381    298        +          CDS    ND4L       NADH dehydrogenase subunit 4L     1
.
.
.
C3252271       14568  14772  205        +          CDS    ATP8       ATP synthase F0 subunit 8         1
C3252271       14729  15410  682        +          CDS    ATP6       ATP synthase F0 subunit 6         1
C3252271       16193  16262  70         +          tRNA   trnG(ucc)  tRNA-Gly                          1
C3252271       16262  >16598 337        +          CDS    ND3        NADH dehydrogenase subunit 3      1

Here, the ND3 cannot find its stop codon, but because this sequence is actually circular already, the stop codon of ND3 is at the 5' end of this sequence.

In this case, you can simply manually change the breakpoint of the mitochondrial genome. But be careful, you should find a breakpoint where there is no gene (especially overlapping regions of different genes), for example, here sites between 14772 and 14729, or the site 16262 are not good positions. Instead, Any positions between 15410 and 16193 can be chosen.

After this, you can re-annotate the sequence using the mitoz annotate command, by also providing the fastq files (--fq1 and/or --fq2 options), MitoZ will calculate the sequencing depth of all sites along the mitogenome, which in turn provides more evidence to show if the mitogenome is really complete or not (check the abundance track on the circos.svg and cirvos.png files) --- if the sequencing depth (abundance) around the original breakpoint is normally high like other sites of the mitogenome, not sudden dropping or increasing a lot (which indicates repeats), then the mitogenome is complete.

4. core error

During testing, when I submitted the job to SGE for running, sometimes it generated a core dump file (e.g. core.81923) in the annotation step, I do not know the exact reason yet. But when I locally re-ran the command, the problem resolved. Complicated Conda environments can also induce problems, e.g. I once used a conda installed by another person to create the mitozEnv environment, and I always got missing PCGs in the annotation step.

5. No Circos images generated

Do not know why sometimes even circos --modules shows every required Perl module is installed, MitoZ still fails to run Circos (thus no circos.svg and circos.png files), which results in something like this in the stderr output:

2022-06-01 17:18:53,822 - mitoz.utility.utility - INFO -
cp -r /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.png /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.svg /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.depth.txt /home/gmeng/test/sing/mt_annotation/ttt.ttt.megahit.mitogenome.fa.result
cp: der Aufruf von stat für „/home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.png“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden
cp: der Aufruf von stat für „/home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.svg“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden

2022-06-01 17:18:53,842 - mitoz.utility.utility - ERROR -
Error occured when running command:
cp -r /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.png /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.svg /home/gmeng/test/sing/mt_annotation/tmp_ttt_ttt.megahit.mitogenome.fa_mitoscaf.fa/mt_visualization/circos.depth.txt /home/gmeng/test/sing/mt_annotation/ttt.ttt.megahit.mitogenome.fa.result

Solution:

For each specimen, find the mt_annotation directory:

$ source activate mitozEnv   # or use "mamba" or "conda" instead of "source" the command here.

$ ls  mt_annotation/tmp_*_mitoscaf.fa/mt_visualization/ -d | while read f ; do cd $f ; circos ; cd ../../../ ; done

# to list the resulting files:
$ ls  mt_annotation/tmp_*_mitoscaf.fa/mt_visualization/circos.{svg,png}

If you have multiple specimens within the same directory, for example, /my/project/SampleID_1, /my/project/SampleID_2,

$ cd /my/project/ # go to the project directory containing the assembly directory of each specimen

$ source activate mitozEnv   # or use "mamba" or "conda" instead of "source" the command here.

$ ls  */mt_annotation/tmp_*_mitoscaf.fa/mt_visualization/ -d | while read f ; do cd $f ; circos ; cd ../../../../ ; done
# the '*' here will match your sample IDs

# to list the resulting files:
$ ls  */mt_annotation/tmp_*_mitoscaf.fa/mt_visualization/circos.{svg,png}

# to copy the SVG/PNG files to the resulting directories
# Warning: The below command assumes there is NO '_' in your sample ID
$ ls  */mt_annotation/tmp_*_mitoscaf.fa/mt_visualization/circos.{png,svg} | perl -a -F'/' -ne 'chomp;  $F[2]=~s/tmp\_//; $F[2]=~s/\_mitoscaf\.fa//; my $sample=(split(/\_/,$F[2]))[0]; $F[2]=~s/$sample\_$sample\.//; my $result_dir="$sample/$sample.result/$sample.$sample.$F[2].result"; `cp $_ $result_dir`; '

updates:.

  1. The Singularity version (MitoZ version 3.3) seems good to me, indicating the above problem is probably because my environmental variables on the cluster are somehow complicated (and I do not know the exact reason), if you have the same problem, please try to install mitozEnv into a clean environment or try the Singularity version.

  2. empty depth file

If the XXX.result/XXX.XXX.megahit.mitogenome.fa.result/circos.depth.txt file is empty, it is normal for Circos not to run properly.

6. The option --assembler megahit fails

To specify a specific assembler, use the --assembler option.

Warning: --assembler megahit only accepts paired-end data, which means that you need to provide both --fq1 and --fq2!

7. can not find taxid for XXX, maybe it's a misspelling.

This can happen if:

  1. typo for the value of the --requiring_taxa option.
  2. Your etetoolkit database is broken. Please re-install it by checking https://github.com/linzhi2013/MitoZ/wiki/Installation#3-the-etetoolkit-database.
Clone this wiki locally