Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kriging_rss plot looks very different despite same processing #207

Open
nickhir opened this issue Dec 4, 2023 · 7 comments
Open

kriging_rss plot looks very different despite same processing #207

nickhir opened this issue Dec 4, 2023 · 7 comments

Comments

@nickhir
Copy link

nickhir commented Dec 4, 2023

Hi all,
I have downloaded GWAS data from two traits for one region that I would like to finemap.
Because we do not have access to the individual level genotype data, I am using the 1000g reference panel to calculate LD between my SNP markers. I am using a function to align/harmonize my alleles (and betas) so that they match the Ref/Alt allele from the 1000g panel. To make sure that this works correctly, I am using the kriging_rss function provided by susieR. For the first trait, this looks good, and it seems like there are no allele switch issues except for one SNP (see plot below, trait1). However, for the second trait, which was processed with exactly the same pipeline, the plot looks very "off" (see plot below, trait2).
kriging_rss

I am particularly surprised that I am not observing any "expected values" > |3| (which was the case for trait 1). Also, despite obvisouly looking worse, trait2 does not contain any "red" points which I thought would indicate outliers. I have attached a picture of the LD matrix and distribution of Zscores for the SNPs below, and just by visual inspection they do not seem extremely different. Also the sample size is very similar, so I am confused why I am observing such different plots.
LD_matrix
zscores

Furthermore, I noticed that I get the following warning when running kriging_rss:

WARNING: The matrix R is not positive semidefinite. Negative eigenvalues are set to zero.
WARNING: The matrix R is not positive semidefinite. Negative eigenvalues are set to zero

This surprised me a little, because I am generating my LD matrix using plink --r, which I thought was the intended way to create the signed LD matrix.

I would very much appreciate any insights!
Cheers!

@pcarbo
Copy link
Member

pcarbo commented Dec 4, 2023

If the GWAS summary statistics are from the same source, then I agree these results are surprising.

Could you please explain what are the differences between the two LD matrices? Are they for different SNPs? In principle one should be able to do use the same LD matrix.

Also please see the PLoS Genetics paper about interpreting the results; a point far away suggests a genotyping inconsistency, not necessarily an allele flip.

@nickhir
Copy link
Author

nickhir commented Dec 5, 2023

Thank you very much for your quick answer! I realized that I made a mistake when converting Z scores to betas and standard errors for the second dataset. Essentially, for some SNPs I was using the "wrong" MAF. Once I fixed this, I got a very good looking kriging_rss$plot, that did not show any obvious outliers. However, now when I run susie, I get 10 credible sets, which seems quite a lot for a region which is only 250kb and only contains ~1k SNPs. Below is a locus zoom plot where I have highlighted the identified CS. Do you think that 10 credible sets are reasonable given the comparetively low LD this region shows or do you think something might have gone wrong? I am particularly concerend with credible set 5, which is made up of only 1 SNP that has a p value much greater than 10^-8.
I read in the PLoS Genetic paper, that outliers in the kriging_rss$plot might lead to an unexpectedly high number of credible sets, but I am quite certain that I have removed all of them.

LZ

I very much appreciate any insights or pointers!

@pcarbo
Copy link
Member

pcarbo commented Dec 5, 2023

If you have many SNPs with very small p-values that are not in strong LD with each other, then this is expected. (And in fact there may be more than 10 CSs — try setting L to a larger value.)

CS 5 does seem a little odd — I would check it carefully. Perhaps running with refine = TRUE will improve the result.

@nickhir
Copy link
Author

nickhir commented Dec 5, 2023

Thank you for your answer! I already tried including refine=TRUE, but this did not change the results.
Can you elaborate what you mean by "carefully checking" CS 5? Because other than checking the genomic features that this SNP overlaps with, I dont have any more ideas..

@pcarbo
Copy link
Member

pcarbo commented Dec 6, 2023

It may depend on the data that are available to you, but I think the larger point is that because you are working with summary statistics from a separate study, and you are using LD estimates that approximate the "true" LD, there are several possible sources of error in your fine-mapping analyses, so one should treat all fine-mapping results with some level of caution. (This is a caveat that applies to many/most fine-mapping analyses that use summary data.)

@koujiaodahan
Copy link

Thank you very much for your quick answer! I realized that I made a mistake when converting Z scores to betas and standard errors for the second dataset. Essentially, for some SNPs I was using the "wrong" MAF. Once I fixed this, I got a very good looking kriging_rss$plot, that did not show any obvious outliers. However, now when I run susie, I get 10 credible sets, which seems quite a lot for a region which is only 250kb and only contains ~1k SNPs. Below is a locus zoom plot where I have highlighted the identified CS. Do you think that 10 credible sets are reasonable given the comparetively low LD this region shows or do you think something might have gone wrong? I am particularly concerend with credible set 5, which is made up of only 1 SNP that has a p value much greater than 10^-8. I read in the PLoS Genetic paper, that outliers in the kriging_rss$plot might lead to an unexpectedly high number of credible sets, but I am quite certain that I have removed all of them.

LZ

I very much appreciate any insights or pointers!

I met a bad consistency too, could you tell me what the "wrong maf" means?

@nickhir
Copy link
Author

nickhir commented Dec 13, 2023

The dataset I used provided a seperate file with MAF for each of the SNPs it contained. However, as part of my preprocessing, I aligned all my alleles according to the 1000g VCF file. This meant that for some of the SNPs REF and ALT alleles got flipped. I then made the mistake of using the MAF file which was provided by the author (and corresponded to the "unflipped" dataset). Essentially, this caused the MAF to be wrong for all the alleles that got flipped during my preprocessing (if allele gets flipped the correct MAF would be "1-reported_MAF"). I hope that makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants