Also encountered "The estimated prior variance is unreasonably large" using ieugwasr ld_matrix_local function to calculate the ld matrix #223

laleoarrow · 2024-04-11T02:35:02Z

Also encountered "The estimated prior variance is unreasonably large" using ieugwasr ld_matrix_local function to calculate the ld matrix. As I have checked the original code of ld_matrix_local function in ieugwasr package in GitHub, when calculating the ld matrix ld_matrix_local does have "--keep-allele-order" argument in the plink command. But such error still exists. Please advice.

laleoarrow · 2024-04-11T02:37:09Z

FYI my session info:
`> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.4.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] LDlinkR_1.3.0 plinkbinr_0.0.0.9000 ieugwasr_0.2.2-9000 TwoSampleMR_0.5.10 coloc_5.2.3
[6] vroom_1.6.5 data.table_1.15.4 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[11] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[16] ggplot2_3.5.0 tidyverse_2.0.0

loaded via a namespace (and not attached):
[1] gtable_0.3.4 htmlwidgets_1.6.4 ggrepel_0.9.5 lattice_0.22-5 tzdb_0.4.0
[6] vctrs_0.6.5 tools_4.3.3 generics_0.1.3 parallel_4.3.3 curl_5.2.1
[11] fansi_1.0.6 pkgconfig_2.0.3 Matrix_1.6-5 ggnewscale_0.4.10 RcppParallel_5.1.7
[16] uuid_1.2-0 lifecycle_1.0.4 compiler_4.3.3 munsell_0.5.0 htmltools_0.5.7
[21] pillar_1.9.0 crayon_1.5.2 viridis_0.6.5 tidyselect_1.2.0 digest_0.6.34
[26] stringi_1.8.3 fastmap_1.1.1 grid_4.3.3 colorspace_2.1-0 cli_3.6.2
[31] magrittr_2.0.3 patchwork_1.2.0 Rfast_2.1.0 utf8_1.2.4 withr_3.0.0
[36] scales_1.3.0 rsthemes_0.4.0 RcppZiggurat_0.1.6 bit64_4.0.5 timechange_0.3.0
[41] httr_1.4.7 matrixStats_1.2.0 bit_4.0.5 gridExtra_2.3 hms_1.1.3
[46] plink2R_1.1 irlba_2.3.5.1 viridisLite_0.4.2 geni.plots_0.0.9 rlang_1.1.3
[51] ggiraph_0.8.9 Rcpp_1.0.12 mixsqp_0.3-54 glue_1.7.0 susieR_0.12.40
[56] rstudioapi_0.15.0 reshape_0.8.9 jsonlite_1.8.8 R6_2.5.1 plyr_1.8.9
[61] systemfonts_1.0.6 `

pcarbo · 2024-04-11T14:12:26Z

@leoarrow1 If you haven't already done so, I would recommend checking for inconsistencies between the LD matrix and the matrix z-scores following this vignette.

laleoarrow · 2024-04-11T16:06:21Z

@leoarrow1 If you haven't already done so, I would recommend checking for inconsistencies between the LD matrix and the matrix z-scores following this vignette.

Thanks for your prompt response. Yes I have check this vignette as well as the former similar post on this issue. The solution was to add the "Adding the flag --keep-allele-order" in plink. But I have checked the ieugwasr code, it does have this flag in it. So this problem might be cause due to something else? Could you perhaps spot anything unusual in my code? dataset is a standard format that is suitable for coloc analysis. (i.e., Only missing the LD matrix for color-susie)

Coloc_to_SuSiE <- function( # my self-made function to change coloc data into coloc_susie data
    dataset, 
    # ld_method = "plink", 
    bfile = "/Users/xxx/project/ref/1kg.v3/EUR"
    ) {
dataset$LD <- ieugwasr::ld_matrix_local(
                           dataset$snp,
                           bfile = bfile,
                           plink_bin = plinkbinr::get_plink_exe(),
                           with_alleles = FALSE
                           )

ld_rows <- rownames(dataset[["LD"]])
snps <- dataset[["snp"]]
index <- match(ld_rows, snps)

for (i in names(dataset)) {
    if (i %in% c("LD", "type", "s", "N")) next
    if (length(dataset[[i]]) > ld_rows_number & length(dataset[[i]]) > 1) {
      dataset[[i]] <- dataset[[i]][index]
      if (length(dataset[[i]]) == ld_rows_number) {leo_message("91", paste0("The SNP in element <", i, "> is now matched with the LD matrix"))}
    }
dataset$N <- as.integer(dataset$N)[1]
  }
  return(dataset)
}

# the error occurs here
dat = Coloc_to_SuSiE(dataset)
S3 = runsusie(dat) # here.

Any thought is highly appreciated.

pcarbo · 2024-04-12T02:41:14Z

@leoarrow1 To better pinpoint the cause of the error, I would try to run susie_rss directly instead of through coloc.

laleoarrow · 2024-04-12T04:56:31Z

@leoarrow1 To better pinpoint the cause of the error, I would try to run susie_rss directly instead of through coloc.

Thanks! I will try so as soon as possible and let you know the result.

laleoarrow · 2024-04-15T02:50:19Z

Hi! I have tried running susie_rss directly which did yielded a result/ But the it did not converge as well, and this outcome is evidently incorrect (there is obviously too many cs?).

Upon reviewing the vignette, I identified inconsistencies in my susie_rss input data but, regrettably, I cannot locate their source.

I have checked the order of z and ld (rowname of ld) provided for susie_rss is in the same order though. Here is the code I used for susie_rss. Can you spot anything incorrectly? Any help is appreciated!

variants = dataset$snp 
# > head(variants)
# [1] "rs112438356" "rs112327275" "rs9270475"   "rs77122291"  "rs9270487"   "rs9270493" 
bfile = "/Users/leoarrow/project/ref/1kg.v3/EUR" # downloaded from https://mrcieu.github.io/ieugwasr/articles/local_ld.html
plink_bin = plinkbinr::get_plink_exe()

dataset$LD  <- ieugwasr::ld_matrix_local(
                           dataset$snp,
                           bfile = bfile,
                           plink_bin = plinkbinr::get_plink_exe(),
                           with_alleles = FALSE
                           )

ld_rows <- rownames(dataset[["LD"]]); ld_rows_number <- length(ld_rows)
snps <- dataset[["snp"]]; input_snps_number <- length(snps)
index <- match(ld_rows, snps)

susie_fit = susie_rss(dataset$z, dataset$LD, n=n)
susie_plot(susie_fit, y='PIP')
# lambda = estimate_s_rss(dataset$z, dataset$LD, n=n)
# lambda
condz = kriging_rss(dataset$z, dataset$LD, n=n)
condz$plot

laleoarrow · 2024-04-15T11:01:30Z

FYI, the source code of ieugwasr::ld_matrix_local is as below for ld matrix calculation. It use plink for calculation but it does have the --keep-allele-order tag in it: (Ref:https://github.com/MRCIEU/ieugwasr/blob/master/R/ld_matrix.R)

ld_matrix_local <- function(variants, bfile, plink_bin, with_alleles=TRUE) {
	# Make textfile
	shell <- ifelse(Sys.info()['sysname'] == "Windows", "cmd", "sh")
	fn <- tempfile()
	write.table(data.frame(variants), file=fn, row.names=FALSE, col.names=FALSE, quote=FALSE)

	
	fun1 <- paste0(
		shQuote(plink_bin, type=shell),
		" --bfile ", shQuote(bfile, type=shell),
		" --extract ", shQuote(fn, type=shell), 
		" --make-just-bim ", 
		" --keep-allele-order ",
		" --out ", shQuote(fn, type=shell)
	)
	system(fun1)

	bim <- read.table(paste0(fn, ".bim"), stringsAsFactors=FALSE)

	fun2 <- paste0(
		shQuote(plink_bin, type=shell),
		" --bfile ", shQuote(bfile, type=shell),
		" --extract ", shQuote(fn, type=shell), 
		" --r square ", 
		" --keep-allele-order ",
		" --out ", shQuote(fn, type=shell)
	)
	system(fun2)
	res <- read.table(paste0(fn, ".ld"), header=FALSE) %>% as.matrix
	if(with_alleles)
	{
		rownames(res) <- colnames(res) <- paste(bim$V2, bim$V5, bim$V6, sep="_")
	} else {
		rownames(res) <- colnames(res) <- bim$V2
	}
	return(res)
}

It looks all right to me. Or is there any other option to do LD matrix calculation?
PS. I'm using gwas summary data and I do not have the genotype data. Also coloc recommend to use 1000 or above snps in the coloc loci, so LD matrix can only be calculated using local method?

pcarbo · 2024-04-15T19:46:03Z

@leoarrow1 I'm not familiar with the ieugwasr R package (I have not used it before), but if you compute the LD matrix from the same genotype data used in computing the association statistics (e.g., z-scores), there should be no inconsistencies in your kriging_rss plot. On the other hand, if the LD matrix is computed using different genotype data, then it is quite possible to obtain large inconsistencies. Removing the largest inconsistencies may involve removing some SNPs, but we have not (yet) developed a systematic procedure for removing large inconsistencies. Hope this helps (a bit).

laleoarrow · 2024-04-16T01:12:03Z

Is there a recommended method to compute LD matrix in ref panel for summary statistics? Thanks ----------

…

--------------原始邮件-------------- 发件人："Peter Carbonetto ***@***.***>; 发送时间：2024年4月16日(星期二) 凌晨3:46 收件人："stephenslab/susieR" ***@***.***>; 抄送："leoarrow1 ***@***.***>;"Mention ***@***.***>; 主题：Re: [stephenslab/susieR] Also encountered "The estimated prior variance is unreasonably large" using ieugwasr ld_matrix_local function to calculate the ld matrix (Issue #223) ----------------------------------- @leoarrow1 I'm not familiar with the ieugwasr R package (I have not used it before), but if you compute the LD matrix from the same genotype data used in computing the association statistics (e.g., z-scores), there should be no inconsistencies in your kriging_rss plot. On the other hand, if the LD matrix is computed using different genotype data, then it is quite possible to obtain large inconsistencies. Removing the largest inconsistencies may involve removing some SNPs, but we have not (yet) developed a systematic procedure for removing large inconsistencies. Hope this helps (a bit). — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

pcarbo · 2024-04-16T19:20:50Z

@leoarrow1 It would depend whether you have access to the same data that was used to generate the association statistics. If the answer is no, then we do not have precise recommendations except that you should work with genotype data that most closely resembles the data used to generate the association statistics. Perhaps @gaow can provide more detailed recommendations; I believe he is working on bioinformatics pipelines to help with this.

laleoarrow · 2024-04-17T00:48:04Z

Thanks for reply we do not have Original genome type individual data, but I guess we can use the reference individual ref panel from 1 kg, which is frequently used as reference data in post gwas analysis?  Message ID: ***@***.***>

akhilpampana · 2024-04-19T17:59:33Z

Hello, I am also facing a similar issue with ld matrix inconsistency. I have used PLINK to generate LD matrices and kept --keep-allele-order. I am using the meta effect of gwas analysis and using 1000g as a reference for LD. My correlation plot is off. Can you help me out with this issue? Here is the plot for the reference. I have used a 1000g complete dataset as the gwas results are from multi-ancestry datasets.

I have also noticed a thing like variant which is shown to have significant tophit for the loci is not available in the credible set. Is it the opposite that we should have that top variant in the credible set?

Regards
Akhil

stephens999 · 2024-04-19T22:46:29Z

To debug your pipeline i would suggest hand checking some example snp pairs. Eg find some snp pairs that have high ld in your computer matrix but very different z scores. Such pairs are obviously problematic. Maybe finding them will help you identify your problem. Sorry, but i hope you understand we don't have time to help debug all user pipelines. Matthew

…

On Fri, Apr 19, 2024, 12:59 AkhilP ***@***.***> wrote: Hello, I am also facing a similar issue with ld matrix inconsistency. I have used PLINK to generate LD matrices and kept --keep-allele-order. I am using the meta effect of gwas analysis and using 1000g as a reference for LD. My correlation plot is off. Can you help me out with this issue? Here is the plot for the reference. I have used a 1000g complete dataset as the gwas results are from multi-ancestry datasets. image.png (view on web) <https://github.com/stephenslab/susieR/assets/19439352/261d6b70-ef29-4e76-9bbc-af1123afade1> I have also noticed a thing like variant which is shown to have significant tophit for the loci is not available in the credible set. Is it the opposite that we should have that top variant in the credible set? Regards Akhil — Reply to this email directly, view it on GitHub <#223 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANXRRLAFQN7QVY3MJIMKZTY6FLRXAVCNFSM6AAAAABGBPRATGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXGA2DSMBSGA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

akhilpampana · 2024-04-19T23:48:15Z

Thank you so much for the response. I understand its very difficult to focus on all pipelines given the tool usage and appreciate your kind response.

I just want to know if the significant variant in the loci shouldl be present in the credible set or not.

Regards
Akhil

laleoarrow · 2024-04-20T11:44:56Z

Hi, quick update for all. I have solved my issue (partially at least). I used ieugwasr package to calculate my ld matrix for a GWAS summary data using 1kg individual data as a reference. ieugwasr incorporated the plink 1.9 when calculating ld matrix and it does have the -keep-allele-order flag in it, however, the allele order would still possible be inconsistent with allele order in your own summary data. So the solution is to flip the allele order when needed. This should at least get you to converge in 100 iteration and a better z-score plot.

Nonetheless, I identified 9 cs in a loci with only 200+ snp. Is this possible? Here is the cs plot and z-score plot. Can anyone spot anything unusual in it? Thanks!

laleoarrow · 2024-04-20T11:49:21Z

Thank you so much for the response. I understand its very difficult to focus on all pipelines given the tool usage and appreciate your kind response.

I just want to know if the significant variant in the loci shouldl be present in the credible set or not.

Regards Akhil

The top snp should be in the strongest cs from my perspective.

gaow · 2024-04-21T14:06:03Z

@leoarrow1 @akhilpampana these problems typically has to do with problematic bioinformatics steps to unify the alleles between summary statistics and LD. The discrepancy between z-scores and LD reference is also a secondary concern.

We do have some pipelines that are still work in progress, to try cope with these issues. It contains two parts:

we unify or harmonize the alleles between data-sets to the best of our knowledge.
we apply some approaches to detect and remove variants where LD and z-scores have large discrepancies.

the 2nd item is still work in progress not ready for public use, although we have a separate branch of susieR to implement the prototype. The entire procedure is outlined here althought at this point it might be the most helpful to compare notes of your procedure with our basic QC for summary statistics.

akhilpampana · 2024-04-22T18:04:33Z

I think I have figured out my issue regarding the discrepancy of the plot but I will checkthrough the tutorial mentioned above just to be sure and test the analysis in the updated version of the ld matrix.

Thank you so much @gaow for the pipeline and @leoarrow1 for the suggestion

Regards
Akhil

pcarbo · 2024-04-22T19:35:06Z

Thank you for sharing @akhilpampana. If you could share in more detail what was the issue and how you solved it, this might be helpful for us (and others) to know.

pcarbo · 2024-04-22T21:09:14Z

The top snp should be in the strongest cs from my perspective.

This will often be true, but sometimes not. See Fig. 1 of the susie paper for an illustration of this.

akhilpampana mentioned this issue May 2, 2024

Model is not converging evena fter running for 10k iterations #226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Also encountered "The estimated prior variance is unreasonably large" using ieugwasr ld_matrix_local function to calculate the ld matrix #223

Also encountered "The estimated prior variance is unreasonably large" using ieugwasr ld_matrix_local function to calculate the ld matrix #223

laleoarrow commented Apr 11, 2024

laleoarrow commented Apr 11, 2024

pcarbo commented Apr 11, 2024

laleoarrow commented Apr 11, 2024 •

edited

Loading

pcarbo commented Apr 12, 2024

laleoarrow commented Apr 12, 2024

laleoarrow commented Apr 15, 2024 •

edited

Loading

laleoarrow commented Apr 15, 2024 •

edited

Loading

pcarbo commented Apr 15, 2024

laleoarrow commented Apr 16, 2024 via email

pcarbo commented Apr 16, 2024

laleoarrow commented Apr 17, 2024 via email

akhilpampana commented Apr 19, 2024

stephens999 commented Apr 19, 2024 via email

akhilpampana commented Apr 19, 2024

laleoarrow commented Apr 20, 2024

laleoarrow commented Apr 20, 2024

gaow commented Apr 21, 2024

akhilpampana commented Apr 22, 2024

pcarbo commented Apr 22, 2024

pcarbo commented Apr 22, 2024 •

edited

Loading

Also encountered "The estimated prior variance is unreasonably large" using ieugwasr ld_matrix_local function to calculate the ld matrix #223

Also encountered "The estimated prior variance is unreasonably large" using ieugwasr ld_matrix_local function to calculate the ld matrix #223

Comments

laleoarrow commented Apr 11, 2024

laleoarrow commented Apr 11, 2024

pcarbo commented Apr 11, 2024

laleoarrow commented Apr 11, 2024 • edited Loading

pcarbo commented Apr 12, 2024

laleoarrow commented Apr 12, 2024

laleoarrow commented Apr 15, 2024 • edited Loading

laleoarrow commented Apr 15, 2024 • edited Loading

pcarbo commented Apr 15, 2024

laleoarrow commented Apr 16, 2024 via email

pcarbo commented Apr 16, 2024

laleoarrow commented Apr 17, 2024 via email

akhilpampana commented Apr 19, 2024

stephens999 commented Apr 19, 2024 via email

akhilpampana commented Apr 19, 2024

laleoarrow commented Apr 20, 2024

laleoarrow commented Apr 20, 2024

gaow commented Apr 21, 2024

akhilpampana commented Apr 22, 2024

pcarbo commented Apr 22, 2024

pcarbo commented Apr 22, 2024 • edited Loading

laleoarrow commented Apr 11, 2024 •

edited

Loading

laleoarrow commented Apr 15, 2024 •

edited

Loading

laleoarrow commented Apr 15, 2024 •

edited

Loading

pcarbo commented Apr 22, 2024 •

edited

Loading