Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to interpretate results? #11

Open
xiekunwhy opened this issue Jan 28, 2024 · 14 comments
Open

How to interpretate results? #11

xiekunwhy opened this issue Jan 28, 2024 · 14 comments

Comments

@xiekunwhy
Copy link

Hi,

for diploid genome, findGSE(histo="Cfl.histo", sizek=21, outdir="hom_test_21mer62", exp_hom = 62), which following number is haploid genome size?

size_all 2831278311
size_exl 2762932750
size_cat 3063218222
size_fit 2276505453
size_cor2 4239285369
Het_rate 0.00913753 0.00913753
Est. ratio of repeats 0.88225222
Final k-mer cov 36.5624931

Best,
Kun

@HeQSun
Copy link
Collaborator

HeQSun commented Jan 29, 2024

Hi,

for diploid genome, findGSE(histo="Cfl.histo", sizek=21, outdir="hom_test_21mer62", exp_hom = 62), which following number is haploid genome size?

size_all 2831278311 size_exl 2762932750 size_cat 3063218222 size_fit 2276505453 size_cor2 4239285369 Het_rate 0.00913753 0.00913753 Est. ratio of repeats 0.88225222 Final k-mer cov 36.5624931

Best, Kun

Hi, can you share the pdf?

@xiekunwhy
Copy link
Author

here is the pdf file
v1.94.est.Cfl.histo.sizek21.curvefitted.pdf

@HeQSun
Copy link
Collaborator

HeQSun commented Jan 30, 2024

here is the pdf file v1.94.est.Cfl.histo.sizek21.curvefitted.pdf

The current result seemed not correct. Can you reset exp_hom = 70, and rerun?

You can also share me the histo file, if that is okay.

@xiekunwhy
Copy link
Author

Thank you for your reply.

Here is exp_hom = 70 results,
v1.94.est.Cfl.histo.sizek21.curvefitted.pdf

and the histo file is here,
Cfl.zip

Best,
Kun

@HeQSun
Copy link
Collaborator

HeQSun commented Jan 31, 2024

Thank you for your reply.

Here is exp_hom = 70 results, v1.94.est.Cfl.histo.sizek21.curvefitted.pdf

and the histo file is here, Cfl.zip

Best, Kun

The histogram look a bit "weird".

Do you know if the species is diploid or polyploid? I am asking because the hist has a peak at 15x, and another at 56x, and the tail of the hist is also with high y-values - repeats or resulting from higher ploidy.

To me, it does not look like a diploid, but more likely a tetraploid.

I can only tell the full genome size is around 10 Gb. The haploid genome size would be 10 Gb / n, where n is the ploidy which you need to figure out.

cfl_raw.pdf

Another explanation could be, this is mixture of different DNA material - maybe there is contamination in DNA in sequencing.

@xiekunwhy
Copy link
Author

Thank you for your help.

There are two state of this species, diploid and tetraploid.

Smudgeplot and karyotype analysis told me that the sample we are analysis is diploid. May be there is contamination in DNA in sequencing.

Here is Smudgeplot results
smudgeplot_smudgeplot
smudgeplot_smudgeplot_log10
smudgeplot_verbose_summary.txt

Best,
Kun

@simleopold
Copy link

Hi,

I also wanted to know how to interpret the results and which number is the "real" genome size. Here is the pdf file.
findGSE-PSR.pdf

Thank you for your help.

@HeQSun
Copy link
Collaborator

HeQSun commented Feb 6, 2024

Thank you for your help.

There are two state of this species, diploid and tetraploid.

Smudgeplot and karyotype analysis told me that the sample we are analysis is diploid. May be there is contamination in DNA in sequencing.

Here is Smudgeplot results smudgeplot_smudgeplot smudgeplot_smudgeplot_log10 smudgeplot_verbose_summary.txt

Best, Kun

I would not believe in k-mer estimation in ploidy, in this particular case, because the peak at 15x has been considered as errors - I do not know what method is underlying this determination.

You can try

  1. using wet-lab method to check the genome size again
  2. blast some of the k-mers at peak 15x, to check if there is chance to figure out which species the k-mers are from.

@HeQSun
Copy link
Collaborator

HeQSun commented Feb 6, 2024

Hi,

I also wanted to know how to interpret the results and which number is the "real" genome size. Here is the pdf file. findGSE-PSR.pdf

Thank you for your help.

This is a homozygous species, you do not need to sep up exp_hom. The last row gives the haploid genome size.

@simleopold
Copy link

Hi,

I ran this command on a genome which I don't know the size and the ploidy level : findGSE(histo = "/Users/icesim/Downloads/21mer_no_cut-2.histo", sizek=21, outdir="/Users/icesim/Desktop/findGSE-teleau", exp_hom = 100)
The result expected was around 8mb so does findGSE gives an estimation for the whole genome size or the haploid genome size ?

Thanks for your help,
findGSE.pdf

@HeQSun
Copy link
Collaborator

HeQSun commented Mar 16, 2024 via email

@simleopold
Copy link

Thank you for your quick answer,

I tried to run it under homozygous mode but I have the following error on R : "Error in singlestart:singleend : NA/NaN argument"

Does it mean I have no choice but to run it under heterozygous mode ?

@HeQSun
Copy link
Collaborator

HeQSun commented Mar 16, 2024 via email

@simleopold
Copy link

I ran this command : findGSE(histo = "/Users/icesim/Downloads/21mer_no_cut-2.histo", sizek=21, outdir="/Users/icesim/Desktop/findGSE-teleau")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants