Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless read from file #218

Open
HaoqingDu opened this issue Aug 8, 2024 · 11 comments
Open

Endless read from file #218

HaoqingDu opened this issue Aug 8, 2024 · 11 comments

Comments

@HaoqingDu
Copy link

HaoqingDu commented Aug 8, 2024

I installed mitoz v3.6 using conda. The test run was ok. I wanted to generate mitogenome from a SE data. Below is my command,

mitoz all --outprefix PhoSp1_trimmed --clade Arthropoda --genetic_code 5 --species_name "Photinus sp1" --fq1 PhoSp1_trimmed.fastq.gz --fastq_read_length 151 --assembler mitoassemble --kmers 51 --requiring_taxa Arthropoda

In the assemble part, read_from_file seems to never stop unless I kill it. Below is the related log info,

2024-08-08 11:37:24,164 - mitoz.utility.utility - INFO - 
/home/hdu/miniconda3/envs/mitoz_env/lib/python3.8/site-packages/mitoz/assemble/script/mitoAssemble/mitoAssemble all  -K 51 -o PhoSp1_trimmed.mitoAssemble.K51 -s PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib -p 8

mitoAssemble is based on SoapTrans v1.03 and modified to fit mitogenome assembly by BGI-ENVers

pregraph -s PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib -K 51 -p 8 -o PhoSp1_trimmed.mitoAssemble.K51 
In PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib, 1 libs, max seq len 151, max name len 256

8 thread created
read from file:
 /home/hdu/Documents/firefly_genomes/results/mitoz/clean_data/PhoSp1_trimmed.clean_R1.fq.gz
--- 100000000th reads
--- 200000000th reads
--- 300000000th reads
--- 400000000th reads
...

I don't know how to solve it.

@linzhi2013
Copy link
Owner

Hi @HaoqingDu ,

/home/hdu/miniconda3/envs/mitoz_env/lib/python3.8/site-packages/mitoz/assemble/script/mitoAssemble/mitoAssemble all  -K 51 -o PhoSp1_trimmed.mitoAssemble.K51 -s PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib -p 8

Could you please have a look at the file: PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib and check if everything inside is okay?

@HaoqingDu
Copy link
Author

Hi @linzhi2013

Thanks for reply. Here is the lines in the lib file

max_rd_len=151
[LIB]
avg_ins=250
reverse_seq=0
asm_flags=3
map_len=32
q=/home/hdu/Documents/firefly_genomes/results/mitoz/clean_data/PhoSp1_trimmed.clean_R1.fq.gz

I also compared it with the lib file for the test data, below is the test data lib file

max_rd_len=150
[LIB]
avg_ins=250
reverse_seq=0
asm_flags=3
map_len=32
q=/home/hdu/Documents/firefly_genomes/data/mitoz_test/test.R1.fq.gz

They look alike, so I assume this may not be the reason?

And the difference between my run and the successful test run is, at this step, inside the K51 directory /home/hdu/Documents/firefly_genomes/results/mitoz/mt_assembly/mitoAssemble/K51, there is only this lib file, no other files. Is this normal?

@linzhi2013
Copy link
Owner

check the content of /home/hdu/Documents/firefly_genomes/results/mitoz/clean_data/PhoSp1_trimmed.clean_R1.fq.gz

and check if it is broken:

gzip -t /home/hdu/Documents/firefly_genomes/results/mitoz/clean_data/PhoSp1_trimmed.clean_R1.fq.gz

You can also run the following command directly and check what will happen:

/home/hdu/miniconda3/envs/mitoz_env/lib/python3.8/site-packages/mitoz/assemble/script/mitoAssemble/mitoAssemble all  -K 51 -o PhoSp1_trimmed.mitoAssemble.K51 -s PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib -p 8

@HaoqingDu
Copy link
Author

Hi,

I just tried both.

First, gzip -t returns nothing, which means the file is complete, right?

Second, when I run mitoAssemble, same thing happens

(mitoz_env) hdu@wla110:~/Documents/firefly_genomes/results/mitoz/mt_assembly/mitoAssemble/K51> /home/hdu/miniconda3/envs/mitoz_env/lib/python3.8/site-packages/mitoz/assemble/script/mitoAssemble/mitoAssemble all  -K 51 -o PhoSp1_trimmed.mitoAssemble.K51 -s PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib -p 8

mitoAssemble is based on SoapTrans v1.03 and modified to fit mitogenome assembly by BGI-ENVers

pregraph -s PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib -K 51 -p 8 -o PhoSp1_trimmed.mitoAssemble.K51 
In PhoSp1_trimmed.mitoAssemble.K51.mitoAssemble.lib, 1 libs, max seq len 151, max name len 256

8 thread created
read from file:
 /home/hdu/Documents/firefly_genomes/results/mitoz/clean_data/PhoSp1_trimmed.clean_R1.fq.gz
--- 100000000th reads
--- 200000000th reads
--- 300000000th reads
--- 400000000th reads
--- 500000000th reads
--- 600000000th reads
--- 700000000th reads
--- 800000000th reads
--- 900000000th reads
--- 1000000000th reads
--- 1100000000th reads
--- 1200000000th reads
--- 1300000000th reads
--- 1400000000th reads
--- 1500000000th reads
--- 1600000000th reads
--- 1700000000th reads
--- 1800000000th reads

@linzhi2013
Copy link
Owner

1800000000 * 150 = 270,000,000,000 bp

You are using 270 Gbp for mitochondrial genome assembly, which is too much, the program cannot handle such a big data.

Try to use 3-5 Gbp data instead.

@HaoqingDu
Copy link
Author

HaoqingDu commented Aug 12, 2024

The raw reads data (fq.gz format) is 30 Gb. I trimmed it with trimmomatic. So the input file fed to Mitoz is only 13 Mb large and contains around 37k reads.

I also tried to run mitoz all with --data_size_for_mt_assembly 3,0 to subsample the data if it were too large. But the same thing happened in read from file.

@linzhi2013
Copy link
Owner

The raw reads data (fq.gz format) is 30 Gb. I trimmed it with trimmomatic. So the input file fed to Mitoz is only 13 Mb large and contains around 37k reads.

But the log says

read from file:
 /home/hdu/Documents/firefly_genomes/results/mitoz/clean_data/PhoSp1_trimmed.clean_R1.fq.gz
....
--- 1,800,000,000th reads

which is 1,800 M reads.

I also tried to run mitoz all with --data_size_for_mt_assembly 3,0 to subsample the data if it were too large. But the same thing happened in read from file.

I have no idea what happened there ...

@HaoqingDu
Copy link
Author

I tried to run with megahit.

mitoz all \
--outprefix PhoSp1_trimmed \
--clade Arthropoda \
--genetic_code 5 \
--species_name "Photinus sp1" \
--fq1 PhoSp1_trimmed.fastq.gz \
--fastq_read_length 151 \
--data_size_for_mt_assembly 3,0 \
--assembler megahit \
--kmers_megahit 51 \
--requiring_taxa Arthropoda \
--min_abundance 0

The good thing is mitoz generated the mitogenome. So I guess the issue may result from mitoAssembler.

But Another thing is the generated genome is only 2450bp, super short. So my data might be not good enough either. However, it seems to be contradictory, because from the short genome I would say my data doesn't have enough reads, while mitoAssembler was loading infinite number of reads from my data.

@linzhi2013
Copy link
Owner

For megahit, please make use of its multi-kmer feature, i.e., to use several kmers at the same time, which might be helpful.

In your code, you used only a single kmer, --kmers_megahit 51 .

@HaoqingDu
Copy link
Author

Thanks for the advice. Using multiple kmers does help! However, the generated mito-genome is still less than 5k bp. So I really need to improve my data.

@linzhi2013
Copy link
Owner

Perhaps try small kmer sizes like 31, 35, 39 etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants