Heterozygous peak identified as errors? #131

Haoran-Xue · 2024-05-10T09:21:53Z

Hello,

I ran kmc and kmc_tools with PacBio HiFi sequences of a diploid plant species:
kmc -m128 -k21 -t40 -ci1 -cs10000 xxx.hifi.fastq.gz xxx xxx_tmp
kmc_tools transform xxx histogram xxx.histo

Then I submit the histo file to GenomeScope2.0 (http://genomescope.org/genomescope2.0/), with "K-mer length: 21, Ploidy: 2, Max k-mer coverage: -1, Average k-mer coverage for polyploid genome: -1".

This is the linear plot I got:

It seems that the fist peak (heterozygous peak) was identified as errors. Is there any way to avoid this?

Thank you!

fperezcobos · 2024-05-30T10:44:29Z

Hi,

I had the same problem, PacBio HiFi sequences of a diploid plant species and the plot looks like this:

Any help?

SamCT · 2024-06-21T17:53:42Z

Also seeing this with one of our genomes. Of a lot of four Revio SMRT cells (all the same species) one plot looks like the above. This particular SMRT cell that has this plot has a much higher number of reads than others, but besides that nothing stands out. The other three plots looked reasonable. I'm curious to know what is causing this

mschatz · 2024-07-28T22:19:18Z

The automatic model fitting algorithm can get confused if you have too high of coverage or if there is ambiguity in the relationships between the homozygous and heterozygous peaks. The easiest way to address is to use the "Average k-mer coverage for polyploid genome" parameter which gives a hint as to where the first peak (heterozygous peak) is located. For these datasets I would try with a value of about 100. If that doesnt work, the next easiest thing to do is downsample the read dataset to reduce the coverage. From a raw read file, you can just use 'head' to select the first N lines in the file to reduce the number of reads, which serves as a random downsample (assuming the reads have not been aligned or other processing has happened) Good luck! Mike

…

On Fri, Jun 21, 2024 at 1:54 PM Sam Talbot ***@***.***> wrote: Also seeing this with one of our genomes. Of a lot of four Revio SMRT cells (all the same species) one plot looks like the above. This particular SMRT cell that has this plot has a much higher number of reads than others, but besides that nothing stands out. The other three plots looked reasonable. I'm curious to know what is causing this — Reply to this email directly, view it on GitHub <#131 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABP3422FVNQNHP7PRLWCPLZIRSDXAVCNFSM6AAAAABHQJLLMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBTGE4DSOJVGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

xiaoyu-stars · 2024-12-14T14:09:09Z

I also encountered this problem, and my solution was to use the “-l” parameter to specify the position of my heterozygous peak such as -l 29(this is my setting number). Actually, I’m not sure whether this counts as solving the problem, because I don’t fully understand the underlying principle, but doing this made my results better (the genome size became more accurate, and the heterozygosity rate returned to normal).

mschatz · 2024-12-15T23:09:22Z

Yes, the "-l" parameter for the command line version is the same as the "Average k-mer coverage for polyploid genome" for the web version. Either way it gives the algorithm a hint as to which peak should be used as the baseline for estimating the genome size and heterozygosity rate Good luck! Mike

…

On Sat, Dec 14, 2024 at 9:09 AM Xiaoyu ***@***.***> wrote: I also encountered this problem, and my solution was to use the “-l” parameter to specify the position of my heterozygous peak such as -l 29(this is my setting number). Actually, I’m not sure whether this counts as solving the problem, because I don’t fully understand the underlying principle, but doing this made my results better (the genome size became more accurate, and the heterozygosity rate returned to normal). — Reply to this email directly, view it on GitHub <#131 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABP343WTW4N24ARQCURPWL2FQ3ZVAVCNFSM6AAAAABHQJLLMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBTGEZDGMZUGE> . You are receiving this because you commented.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heterozygous peak identified as errors? #131

Heterozygous peak identified as errors? #131

Haoran-Xue commented May 10, 2024

fperezcobos commented May 30, 2024

SamCT commented Jun 21, 2024

mschatz commented Jul 28, 2024 via email

xiaoyu-stars commented Dec 14, 2024

mschatz commented Dec 15, 2024 via email

Heterozygous peak identified as errors? #131

Heterozygous peak identified as errors? #131

Comments

Haoran-Xue commented May 10, 2024

fperezcobos commented May 30, 2024

SamCT commented Jun 21, 2024

mschatz commented Jul 28, 2024 via email

xiaoyu-stars commented Dec 14, 2024

mschatz commented Dec 15, 2024 via email