-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heterozygous peak identified as errors? #131
Comments
Also seeing this with one of our genomes. Of a lot of four Revio SMRT cells (all the same species) one plot looks like the above. This particular SMRT cell that has this plot has a much higher number of reads than others, but besides that nothing stands out. The other three plots looked reasonable. I'm curious to know what is causing this |
The automatic model fitting algorithm can get confused if you have too high
of coverage or if there is ambiguity in the relationships between the
homozygous and heterozygous peaks. The easiest way to address is to use the
"Average k-mer coverage for polyploid genome" parameter which gives a hint
as to where the first peak (heterozygous peak) is located. For these
datasets I would try with a value of about 100. If that doesnt work, the
next easiest thing to do is downsample the read dataset to reduce the
coverage. From a raw read file, you can just use 'head' to select the first
N lines in the file to reduce the number of reads, which serves as a random
downsample (assuming the reads have not been aligned or other processing
has happened)
Good luck!
Mike
…On Fri, Jun 21, 2024 at 1:54 PM Sam Talbot ***@***.***> wrote:
Also seeing this with one of our genomes. Of a lot of four Revio SMRT
cells (all the same species) one plot looks like the above. This particular
SMRT cell that has this plot has a much higher number of reads than others,
but besides that nothing stands out. The other three plots looked
reasonable. I'm curious to know what is causing this
—
Reply to this email directly, view it on GitHub
<#131 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABP3422FVNQNHP7PRLWCPLZIRSDXAVCNFSM6AAAAABHQJLLMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBTGE4DSOJVGQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I also encountered this problem, and my solution was to use the “-l” parameter to specify the position of my heterozygous peak such as -l 29(this is my setting number). Actually, I’m not sure whether this counts as solving the problem, because I don’t fully understand the underlying principle, but doing this made my results better (the genome size became more accurate, and the heterozygosity rate returned to normal). |
Yes, the "-l" parameter for the command line version is the same as the
"Average k-mer coverage for polyploid genome" for the web version. Either
way it gives the algorithm a hint as to which peak should be used as the
baseline for estimating the genome size and heterozygosity rate
Good luck!
Mike
…On Sat, Dec 14, 2024 at 9:09 AM Xiaoyu ***@***.***> wrote:
I also encountered this problem, and my solution was to use the “-l”
parameter to specify the position of my heterozygous peak such as -l
29(this is my setting number). Actually, I’m not sure whether this counts
as solving the problem, because I don’t fully understand the underlying
principle, but doing this made my results better (the genome size became
more accurate, and the heterozygosity rate returned to normal).
—
Reply to this email directly, view it on GitHub
<#131 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABP343WTW4N24ARQCURPWL2FQ3ZVAVCNFSM6AAAAABHQJLLMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBTGEZDGMZUGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hello,
I ran kmc and kmc_tools with PacBio HiFi sequences of a diploid plant species:
kmc -m128 -k21 -t40 -ci1 -cs10000 xxx.hifi.fastq.gz xxx xxx_tmp
kmc_tools transform xxx histogram xxx.histo
Then I submit the histo file to GenomeScope2.0 (http://genomescope.org/genomescope2.0/), with "K-mer length: 21, Ploidy: 2, Max k-mer coverage: -1, Average k-mer coverage for polyploid genome: -1".
This is the linear plot I got:
It seems that the fist peak (heterozygous peak) was identified as errors. Is there any way to avoid this?
Thank you!
The text was updated successfully, but these errors were encountered: