Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in dimnames(covmat.unscaled) <- list(xnames, xnames) : length of 'dimnames' [1] not equal to array extent #9

Open
charlesgwellem opened this issue Aug 6, 2019 · 0 comments

Comments

@charlesgwellem
Copy link

charlesgwellem commented Aug 6, 2019

Hi. I will be very grateful for any hint on how to overcome the error. I wish to deconvolve my bulk RNA seq data from the lungs of mice using single cell RNA seq data. For practice, I am following your tutorial using my own bulk seq data and the single cell rna seq data you have provided. All seems to go well until when I try to obtain the Cell type-specific differential expression by running the following code. From the beginning I thought the presence of factors(e.g. for th treatment conditions or phenoData) in my data set was the cause but after getting rid of the factors I still get the same error:

csfit <- bseqsc_csdiff(the_best_ilc_bulk_expr_set[genes, ] ~ phenoData| alpha + beta + ductal + acinar,  
                       verbose = 2, nperms = 5000, .rng = 12345)`

Then I get this error:

Groups: bleo=5L | bleo_ko_i=5L | bleo_wt_i=5L | Nacl=5L | NA0L
Cell type(s): 'alpha', 'beta', ..., 'acinar' (4 total)
Fitting mode: auto
Data (filtered): 1202 features x 20 samples
Model has factor effect with more than 2 levels: fitting lm interaction model
Fitting model with nonnegative effects
Model with more than 2 groups: switching to version 2
 Fitting linear interaction model ... Error in dimnames(covmat.unscaled) <- list(xnames, xnames) : 
  length of 'dimnames' [1] not equal to array extent
In addition: Warning messages:
1: In lsfit(D, G, intercept = FALSE) : 'X' matrix was collinear
2: In sqrt(((n - p) * stddevmat^2 - resids^2/(1 - hatdiag[good]))/(n -  :
  NaNs produced

Here are the details of my code:

# inspired from https://github.com/cozygene/bisque/issues/4

# creating expression set from my bulk rna seq data

library(openxlsx)
library(Biobase)
library(bseqsc)
library(tidyverse)
bseqsc_config('CIBERSORT.R')
data(pancreasMarkers)


# read in the raw data

bulk <- read.xlsx("all_samples.xlsx",
                  sheet = "rawCounts_unrepeated_genes")
head(bulk)

bulk$Gene <- NULL
rownames(bulk) <- toupper(bulk$external_gene_name)
bulk$external_gene_name <- NULL
bulk <- as.matrix(bulk)
head(bulk)


# create expression matrix 

featureData <- as.character(rownames(bulk))
featureData <- as(data.frame(featureData, 
                             stringsAsFactors = FALSE), "AnnotatedDataFrame")
rownames(featureData) <- rownames(bulk)
phenoData <- c(rep("bleo", 5), rep("Nacl", 5), rep("bleo_wt_i", 5), rep("bleo_ko_i", 5))
phenoData <- as(as.data.frame(phenoData), "AnnotatedDataFrame")
rownames(phenoData) <- colnames(bulk)
the_best_ilc_bulk_expr_set <- ExpressionSet(assayData = bulk, 
                                            phenoData = phenoData, featureData = featureData)

# read in the single cell rna seq data and fit the model

eislet <- readRDS('islet-eset.rds')
B <- bseqsc_basis(eislet, pancreasMarkers, 
                  clusters = 'cellType', samples = 'sampleID', ct.scale = TRUE)
fit <- bseqsc_proportions(the_best_ilc_bulk_expr_set, B, verbose = TRUE)


pData(the_best_ilc_bulk_expr_set) <- cbind(pData(the_best_ilc_bulk_expr_set), t(coef(fit)))

fit_edger<- fitEdgeR(the_best_ilc_bulk_expr_set, ~phenoData, 
                     coef = c("phenoDatableo_ko_i", "phenoDatableo_wt_i", "phenoDataNacl"))

# extended

fit_edger_ext <- fitEdgeR(the_best_ilc_bulk_expr_set, ~ phenoData + beta + ductal + acinar +gamma,
                          coef = c("phenoDatableo_ko_i", "phenoDatableo_wt_i",
                                   "phenoDataNacl", "beta", "ductal", 
                                   "acinar", "gamma"))


fit_edger_ext$Symbol <- rownames(fit_edger_ext)

# gather P-values from both models

df_fit_edger <- as.data.frame(fit_edger, stringsAsFactors = FALSE)
df_fit_edger$Symbol <- rownames(df_fit_edger)

req_df_fit_edger <- df_fit_edger["PValue"]
colnames(req_df_fit_edger) <- "Base"
head(req_df_fit_edger)
req_df_fit_edger$Symbol <- rownames(req_df_fit_edger)
df_fit_edger_ext <- as.data.frame(fit_edger_ext, stringsAsFactors = FALSE)
df_fit_edger_ext$Symbol <- rownames(df_fit_edger_ext)

head(df_fit_edger_ext)

req_df_fit_edger_ext <- df_fit_edger_ext["PValue"]
colnames(req_df_fit_edger_ext) <- "Adjusted"
req_df_fit_edger_ext$Symbol <- rownames(req_df_fit_edger_ext)


edger_pvals <- req_df_fit_edger_ext %>%
  inner_join(req_df_fit_edger)
head(edger_pvals)#

rownames(edger_pvals) <- edger_pvals$Symbol



edger_pvals <- mutate(edger_pvals, Regulated = Adjusted <= 0.01 & Adjusted <= Base)

# plot Base vs Adjusted
pvalueScatter(Base ~ Adjusted, edger_pvals, pval.th = 0.01, label.th = 3.5)

# ER genes
genes_ER <- c('HSPA5', 'MAFA', 'HERPUD1', 'DDIT3', 'UCN3', 'NEUROD1')
# Fit on ER stress genes and genes regulated beyond cell type proportion differences
genes <- union(genes_ER, subset(edger_pvals, Regulated)$Symbol)

csfit <- bseqsc_csdiff(the_best_ilc_bulk_expr_set[genes, ] ~ phenoData| alpha + beta + ductal + acinar,  
                       verbose = 2, nperms = 5000, .rng = 12345)

Then I have the error:

Groups: bleo=5L | bleo_ko_i=5L | bleo_wt_i=5L | Nacl=5L | NA0L
Cell type(s): 'alpha', 'beta', ..., 'acinar' (4 total)
Fitting mode: auto
Data (filtered): 1202 features x 20 samples
Model has factor effect with more than 2 levels: fitting lm interaction model
Fitting model with nonnegative effects
Model with more than 2 groups: switching to version 2
 Fitting linear interaction model ... Error in dimnames(covmat.unscaled) <- list(xnames, xnames) : 
  length of 'dimnames' [1] not equal to array extent
In addition: Warning messages:
1: In lsfit(D, G, intercept = FALSE) : 'X' matrix was collinear
2: In sqrt(((n - p) * stddevmat^2 - resids^2/(1 - hatdiag[good]))/(n -  :
  NaNs produced


> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] edgeR_3.26.6          limma_3.40.6          preprocessCore_1.46.0 e1071_1.7-2           forcats_0.4.0         stringr_1.4.0        
 [7] dplyr_0.8.3           purrr_0.3.2           readr_1.3.1           tidyr_0.8.3           tibble_2.1.3          ggplot2_3.2.0        
[13] tidyverse_1.2.1       bseqsc_1.0            csSAM_1.4             Rcpp_1.0.2            Biobase_2.44.0        BiocGenerics_0.30.0  
[19] openxlsx_4.1.0.1     

loaded via a namespace (and not attached):
 [1] nlme_3.1-140         lubridate_1.7.4      bit64_0.9-7          doParallel_1.0.14    RColorBrewer_1.1-2   httr_1.4.0          
 [7] tools_3.6.0          backports_1.1.4      R6_2.4.0             DBI_1.0.0            lazyeval_0.2.2       colorspace_1.4-1    
[13] withr_2.1.2          tidyselect_0.2.5     gridExtra_2.3        xbioc_0.1.17         bit_1.1-14           compiler_3.6.0      
[19] cli_1.1.0            rvest_0.3.4          xml2_1.2.1           pkgmaker_0.28        scales_1.0.0         NMF_0.22            
[25] digest_0.6.20        pkgconfig_2.0.2      bibtex_0.4.2         rlang_0.4.0          readxl_1.3.1         rstudioapi_0.10     
[31] RSQLite_2.1.2        generics_0.0.2       jsonlite_1.6         dendextend_1.12.0    zip_2.0.3            magrittr_1.5        
[37] Formula_1.2-3        munsell_0.5.0        S4Vectors_0.22.0     viridis_0.5.1        stringi_1.4.3        yaml_2.2.0          
[43] plyr_1.8.4           grid_3.6.0           blob_1.2.0           crayon_1.3.4         lattice_0.20-38      haven_2.1.1         
[49] splines_3.6.0        hms_0.5.0            locfit_1.5-9.1       zeallot_0.1.0        pillar_1.4.2         rngtools_1.4        
[55] reshape2_1.4.3       codetools_0.2-16     stats4_3.6.0         glue_1.3.1           BiocManager_1.30.4   modelr_0.1.4        
[61] vctrs_0.2.0          foreach_1.4.7        cellranger_1.1.0     gtable_0.3.0         assertthat_0.2.1     gridBase_0.4-7      
[67] xtable_1.8-4         broom_0.5.2          class_7.3-15         viridisLite_0.3.0    iterators_1.0.12     AnnotationDbi_1.46.0
[73] registry_0.5-1       memoise_1.1.0        IRanges_2.18.1       cluster_2.1.0   

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant