Please help me to understand output #135

OrsonMM · 2024-08-12T15:08:27Z

Dear team Genomescope2.

I am using the tools to understand why I have a bad novo assembly, with more than 1M of contigs and small.

I have illumina sequence of Cavia Porcellus (guinea pig), genome size in GENBANK is around 3 GB. is diploid (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_034190915.1/)

I had this output in genomescope2:

GENOME_01 :

Rscript genomescope02.R -i $1.histo -k $2 -p 2 -n $1 -o $1_$2_dir

GenomeScope version 2.0
input file = genome_01.histo
output directory = genome_01_dir
p = 2
k = 21
name prefix = genome_01

property                      min               max
Homozygous (aa)               60.4219%          100%
Heterozygous (ab)             0%                39.5781%
Genome Haploid Length         515,546,895 bp    531,285,372 bp
Genome Repeat Length          254,706,772 bp    262,482,392 bp
Genome Unique Length          260,840,123 bp    268,802,980 bp
Model Fit                     48.8344%          74.6587%
Read Error Rate               2.91981%          2.91981%

I interpret, that Genomescope02 gives me that I have a sequencing that represents 531,285,372 bp of genomic size. In addition, the sequencing is at ~13x. And that I have a high error rate in my reads of almost 3%.

If the total genome size is 3GB in genbank, that means that having 531 Mpb, with 10x coverage, if I would have good information to get a not so bad denovo assembly.

Please, any suggestion.

The text was updated successfully, but these errors were encountered:

mschatz · 2024-08-19T02:29:46Z

Thanks for writing. Your kmer profile shows a bad fit between the kmer counts you have in the reads (in blue) and the GenomeScope model (in black). As such, I wouldnt trust the genome size estimate or the predicted level of heterozygosity. As you mention the major challenge here is the low coverage: focusing on the blue curve (from your reads) you have a peak at about 10x coverage which is too low for a reliable GenomeScope estimate. This will also lead to a poor assembly. My strong recommendation would be to seek out more coverage. I would also strongly recommend PacBio HiFi reads over short reads for a mammalian de novo assembly. Good luck! Mike

…

On Mon, Aug 12, 2024 at 11:08 AM Orson ***@***.***> wrote: Dear team Genomescope2. I am using the tools to understand why I have a bad novo assembly, with more than 1M of contigs and small. I have illumina sequence of Cavia Porcellus (guinea pig), genome size in GENBANK is around 3 GB. ( https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_034190915.1/) I had this output in genomescope2: GENOME_01 : Rscript genomescope02.R -i $1.histo -k $2 -p 2 -n $1 -o $1_$2_dir GenomeScope version 2.0 input file = genome_01.histo output directory = genome_01_dir p = 2 k = 21 name prefix = genome_01 property min max Homozygous (aa) 60.4219% 100% Heterozygous (ab) 0% 39.5781% Genome Haploid Length 515,546,895 bp 531,285,372 bp Genome Repeat Length 254,706,772 bp 262,482,392 bp Genome Unique Length 260,840,123 bp 268,802,980 bp Model Fit 48.8344% 74.6587% Read Error Rate 2.91981% 2.91981% SA42913_transformed_log_plot.png (view on web) <https://github.com/user-attachments/assets/e4a6e36a-530a-4e71-bd73-d8fac6cb1f98> I interpret, that Genomescope02 gives me that I have a sequencing that represents 531,285,372 bp of genomic size. In addition, the sequencing is at ~13x. And that I have a high error rate in my reads of almost 3%. If the total genome size is 3GB in genbank, that means that having 531 Mpb, with 10x coverage, if I would have good information to get a not so bad denovo assembly. Please, any suggestion. — Reply to this email directly, view it on GitHub <#135>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABP342N7UEXWYSVPHFIFGLZRDFYBAVCNFSM6AAAAABMMMCSPOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ3DCMRVG43DINQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please help me to understand output #135

Please help me to understand output #135

OrsonMM commented Aug 12, 2024 •

edited

Loading

mschatz commented Aug 19, 2024 via email

Please help me to understand output #135

Please help me to understand output #135

Comments

OrsonMM commented Aug 12, 2024 • edited Loading

mschatz commented Aug 19, 2024 via email

OrsonMM commented Aug 12, 2024 •

edited

Loading