-
Notifications
You must be signed in to change notification settings - Fork 39
The mitoz tools group_seq_by_gene command
Guanliang MENG edited this page Jun 22, 2023
·
1 revision
To group the gene sequences of different samples into different files by genes.
$ mitoz-tools group_seq_by_gene -h
usage: mitoz-tools group_seq_by_gene [-h] [-r <file>] [-d <str>] [-p <str>] [-clean_header]
To group the gene sequences of different samples into different files by genes.
Please cite:
Guanliang Meng, Yiyuan Li, Chentao Yang, Shanlin Liu,
MitoZ: a toolkit for animal mitochondrial genome assembly, annotation
and visualization, Nucleic Acids Research, https://doi.org/10.1093/nar/gkz173
optional arguments:
-h, --help show this help message and exit
-r <file> the gene file list. Per-line format: Abbreviation geneFilePath. The abbreviation will be added
to the seqid to indicate different samples.
-d <str> the delimiter between the abbreviation and the seqid [;]
-p <str> the prefix of all result files [MitoZ]
-clean_header Only shows the 'Abbreviation' in the sequence header [False]
Prepare a file (e.g. called gene_f_list
) whose content looks like this:
DM01 DM01/DM01.result/DM01.DM01.megahit.mitogenome.fa.result/DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.gene.fasta
DM02 DM02/DM02.result/DM02.DM02.megahit.mitogenome.fa.result/DM02_DM02.megahit.mitogenome.fa_mitoscaf.fa.gbf.gene.fasta
For content format (per line) is:
sampleID /path/to/the/fasta_file
- The
sampleID
(the first column) will be added to the beginning of the sequencing title of the resulting files. - The second column is the path to the fasta format files, which can be any of them:
-rw-rw-r-- 1 gmeng 17K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.gene.fasta -rw-rw-r-- 1 gmeng 12K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.cds.fasta -rw-rw-r-- 1 gmeng 2.6K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.trna.fasta -rw-rw-r-- 1 gmeng 2.7K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.rrna.fasta -rw-rw-r-- 1 gmeng 17K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.fasta -rw-rw-r-- 1 gmeng 4.3K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.cds_translation.fasta
Then execute:
$ mitoz-tools group_seq_by_gene -r gene_f_list -d '_' -p MitoZ
We got:
$ ls -lh
-rw-rw-r-- 1 gmeng gmeng 42 Jul 8 17:24 gene_f_list
-rw-rw-r-- 1 gmeng gmeng 1.5K Jul 8 17:36 MitoZ.gene-ATP6.fa
-rw-rw-r-- 1 gmeng gmeng 406 Jul 8 17:36 MitoZ.gene-ATP8.fa
-rw-rw-r-- 1 gmeng gmeng 3.2K Jul 8 17:36 MitoZ.gene-COX1.fa
-rw-rw-r-- 1 gmeng gmeng 1.5K Jul 8 17:36 MitoZ.gene-COX2.fa
-rw-rw-r-- 1 gmeng gmeng 1.6K Jul 8 17:36 MitoZ.gene-COX3.fa
-rw-rw-r-- 1 gmeng gmeng 2.4K Jul 8 17:36 MitoZ.gene-CYTB.fa
-rw-rw-r-- 1 gmeng gmeng 3.4K Jul 8 17:36 MitoZ.gene-l-rRNA.fa
-rw-rw-r-- 1 gmeng gmeng 2.0K Jul 8 17:36 MitoZ.gene-ND1.fa
-rw-rw-r-- 1 gmeng gmeng 2.2K Jul 8 17:36 MitoZ.gene-ND2.fa
-rw-rw-r-- 1 gmeng gmeng 680 Jul 8 17:36 MitoZ.gene-ND3.fa
-rw-rw-r-- 1 gmeng gmeng 2.8K Jul 8 17:36 MitoZ.gene-ND4.fa
-rw-rw-r-- 1 gmeng gmeng 668 Jul 8 17:36 MitoZ.gene-ND4L.fa
-rw-rw-r-- 1 gmeng gmeng 3.7K Jul 8 17:36 MitoZ.gene-ND5.fa
-rw-rw-r-- 1 gmeng gmeng 1.1K Jul 8 17:36 MitoZ.gene-ND6.fa
-rw-rw-r-- 1 gmeng gmeng 2.0K Jul 8 17:36 MitoZ.gene-s-rRNA.fa
-rw-rw-r-- 1 gmeng gmeng 216 Jul 8 17:36 MitoZ.gene-trnA(ugc).fa
-rw-rw-r-- 1 gmeng gmeng 212 Jul 8 17:36 MitoZ.gene-trnC(gca).fa
-rw-rw-r-- 1 gmeng gmeng 218 Jul 8 17:36 MitoZ.gene-trnD(guc).fa
-rw-rw-r-- 1 gmeng gmeng 218 Jul 8 17:36 MitoZ.gene-trnE(uuc).fa
-rw-rw-r-- 1 gmeng gmeng 216 Jul 8 17:36 MitoZ.gene-trnF(gaa).fa
-rw-rw-r-- 1 gmeng gmeng 214 Jul 8 17:36 MitoZ.gene-trnG(ucc).fa
-rw-rw-r-- 1 gmeng gmeng 222 Jul 8 17:36 MitoZ.gene-trnH(gug).fa
-rw-rw-r-- 1 gmeng gmeng 214 Jul 8 17:36 MitoZ.gene-trnI(gau).fa
-rw-rw-r-- 1 gmeng gmeng 224 Jul 8 17:36 MitoZ.gene-trnK(uuu).fa
-rw-rw-r-- 1 gmeng gmeng 226 Jul 8 17:36 MitoZ.gene-trnL(uaa).fa
-rw-rw-r-- 1 gmeng gmeng 228 Jul 8 17:36 MitoZ.gene-trnL(uag).fa
-rw-rw-r-- 1 gmeng gmeng 218 Jul 8 17:36 MitoZ.gene-trnM(cau).fa
-rw-rw-r-- 1 gmeng gmeng 224 Jul 8 17:36 MitoZ.gene-trnN(guu).fa
-rw-rw-r-- 1 gmeng gmeng 222 Jul 8 17:36 MitoZ.gene-trnP(ugg).fa
-rw-rw-r-- 1 gmeng gmeng 220 Jul 8 17:36 MitoZ.gene-trnQ(uug).fa
-rw-rw-r-- 1 gmeng gmeng 220 Jul 8 17:36 MitoZ.gene-trnR(ucg).fa
-rw-rw-r-- 1 gmeng gmeng 222 Jul 8 17:36 MitoZ.gene-trnS(gcu).fa
-rw-rw-r-- 1 gmeng gmeng 222 Jul 8 17:36 MitoZ.gene-trnS(uga).fa
-rw-rw-r-- 1 gmeng gmeng 226 Jul 8 17:36 MitoZ.gene-trnT(ugu).fa
-rw-rw-r-- 1 gmeng gmeng 222 Jul 8 17:36 MitoZ.gene-trnV(uac).fa
-rw-rw-r-- 1 gmeng gmeng 222 Jul 8 17:36 MitoZ.gene-trnW(uca).fa
-rw-rw-r-- 1 gmeng gmeng 216 Jul 8 17:36 MitoZ.gene-trnY(gua).fa
$ grep '>' MitoZ.gene-COX1.fa
>DM01_COX1;len=1557;[2925:4482](-)
>DM02_COX1;len=1557;[2925:4482](-)
You can change the -p
to any other string, say, your project ID.
You can also change the delimiter of the sequence title to other strings, for example, I don't want the DM01
being connected to the COX1
:
$ mitoz-tools group_seq_by_gene -r gene_f_list -d ' ' -p MitoZ
$ grep '>' MitoZ.gene-COX1.fa
>DM01 COX1;len=1557;[2925:4482](-)
>DM02 COX1;len=1557;[2925:4482](-)
If you want a clean sequence header:
$ mitoz-tools group_seq_by_gene -r gene_f_list -p MitoZ -clean_header
$ grep '>' MitoZ.gene-COX1.fa
>DM01
>DM02
Now you can use the MitoZ.gene-*.fa
files for subsequent analysis, e.g. to perform multiple sequence alignment with the MAFFT program.
About:
Commands:
- The -all- subcommand
- The -filter- subcommand
- The -assemble- subcommand
- The -findmitoscaf- subcommand
- The -annotate- subcommand
- The -visualize- subcommand
Usages:
- Installation
- Tutorial
- Extending MitoZ-s database
- Batch processing of many samples
- Known issues
- FAQ
- Some important intermediate files
- Upload to GenBank
MitoZ-tools:
- Overview: The -mitoz tools- command
- The -mitoz-tools--group_seq_by_gene- command
- The -mitoz tools bold_identification- command
- The -mitoz tools circle_check- command
- The -mitoz tools gbfiletool- command
- The -mitoz tools gbseqextractor- command
- The -mitoz tools msaconverter- command
- The -mitoz tools taxonomy_ranks- command