-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please help me to understand output #135
Comments
Thanks for writing. Your kmer profile shows a bad fit between the kmer
counts you have in the reads (in blue) and the GenomeScope model (in
black). As such, I wouldnt trust the genome size estimate or the predicted
level of heterozygosity. As you mention the major challenge here is the low
coverage: focusing on the blue curve (from your reads) you have a peak at
about 10x coverage which is too low for a reliable GenomeScope estimate.
This will also lead to a poor assembly. My strong recommendation would be
to seek out more coverage. I would also strongly recommend PacBio HiFi
reads over short reads for a mammalian de novo assembly.
Good luck!
Mike
…On Mon, Aug 12, 2024 at 11:08 AM Orson ***@***.***> wrote:
Dear team Genomescope2.
I am using the tools to understand why I have a bad novo assembly, with
more than 1M of contigs and small.
I have illumina sequence of Cavia Porcellus (guinea pig), genome size in
GENBANK is around 3 GB. (
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_034190915.1/)
I had this output in genomescope2:
GENOME_01 :
Rscript genomescope02.R -i $1.histo -k $2 -p 2 -n $1 -o $1_$2_dir
GenomeScope version 2.0
input file = genome_01.histo
output directory = genome_01_dir
p = 2
k = 21
name prefix = genome_01
property min max
Homozygous (aa) 60.4219% 100%
Heterozygous (ab) 0% 39.5781%
Genome Haploid Length 515,546,895 bp 531,285,372 bp
Genome Repeat Length 254,706,772 bp 262,482,392 bp
Genome Unique Length 260,840,123 bp 268,802,980 bp
Model Fit 48.8344% 74.6587%
Read Error Rate 2.91981% 2.91981%
SA42913_transformed_log_plot.png (view on web)
<https://github.com/user-attachments/assets/e4a6e36a-530a-4e71-bd73-d8fac6cb1f98>
I interpret, that Genomescope02 gives me that I have a sequencing that
represents 531,285,372 bp of genomic size. In addition, the sequencing is
at ~13x. And that I have a high error rate in my reads of almost 3%.
If the total genome size is 3GB in genbank, that means that having 531
Mpb, with 10x coverage, if I would have good information to get a not so
bad denovo assembly.
Please, any suggestion.
—
Reply to this email directly, view it on GitHub
<#135>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABP342N7UEXWYSVPHFIFGLZRDFYBAVCNFSM6AAAAABMMMCSPOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ3DCMRVG43DINQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Dear team Genomescope2.
I am using the tools to understand why I have a bad novo assembly, with more than 1M of contigs and small.
I have illumina sequence of Cavia Porcellus (guinea pig), genome size in GENBANK is around 3 GB. is diploid (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_034190915.1/)
I had this output in genomescope2:
GENOME_01 :
I interpret, that Genomescope02 gives me that I have a sequencing that represents 531,285,372 bp of genomic size. In addition, the sequencing is at ~13x. And that I have a high error rate in my reads of almost 3%.
If the total genome size is 3GB in genbank, that means that having 531 Mpb, with 10x coverage, if I would have good information to get a not so bad denovo assembly.
Please, any suggestion.
The text was updated successfully, but these errors were encountered: