Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pathway_heatmap can't extract columns past the end #118

Open
luigallucci opened this issue Sep 18, 2024 · 7 comments
Open

pathway_heatmap can't extract columns past the end #118

luigallucci opened this issue Sep 18, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@luigallucci
Copy link

Describe the Bug
Hi, I'm trying to make the heatmap directly from picrust2 file. I tried to modify the sample column to sample_name or other modification, but nothing worked.
Error in pull():
! Can't extract columns past the end.
ℹ Location 1 doesn't exist.
ℹ There are only 0 columns.
Reproducible Example

annotated_kegg <- pathway_annotation(file = abundance_file, pathway = "KO", ko_to_kegg = TRUE)

heat <- pathway_heatmap(annotated_kegg, metadata, "Type")

Environment Information:

  • Operating System: MAC OS - osx-arm64
  • R Version: 4.4.0
  • Package Version: latest
@luigallucci luigallucci added the bug Something isn't working label Sep 18, 2024
@cafferychen777
Copy link
Owner

Dear l.gallucci,

Thank you for reporting this issue with the pathway_heatmap function in the ggpicrust2 package. To better assist you, I'll need some additional information:

  1. Could you please share the first few lines of your abundance_file and metadata file? This will help me understand the structure of your data.

  2. What are the dimensions (number of rows and columns) of your annotated_kegg and metadata dataframes?

  3. Can you provide the full error message and traceback you're receiving?

  4. To facilitate debugging, it would be extremely helpful if you could send your abundance_file and metadata file to [email protected]. Please ensure to remove any sensitive information before sharing.

  5. Could you also share the output of sessionInfo() to provide more details about your R environment?

Once I have this information, I'll be able to reproduce the issue and work on a solution more effectively.

Thank you for your patience and cooperation in resolving this issue.

Best regards,
Chen Yang

@luigallucci
Copy link
Author

sure.

<error/vctrs_error_subscript_oob>
Error in `pull()`:
! Can't extract columns past the end.
ℹ Location 1 doesn't exist.
ℹ There are only 0 columns.
---
Backtrace:
     ▆
  1. ├─ggpicrust2::pathway_heatmap(annotated_kegg, metadata, "Type")
  2. │ └─metadata %>% select(all_of(c(sample_name_col))) %>% pull()
  3. ├─dplyr::pull(.)
  4. ├─dplyr:::pull.data.frame(.)
  5. │ └─tidyselect::vars_pull(names(.data), !!enquo(var))
  6. │   └─tidyselect:::pull_as_location2(...)
  7. │     ├─tidyselect:::with_subscript_errors(...)
  8. │     │ └─base::withCallingHandlers(...)
  9. │     └─vctrs::num_as_location2(...)
 10. │       ├─vctrs:::result_get(...)
 11. │       └─vctrs:::vec_as_location2_result(...)
 12. │         ├─base::tryCatch(...)
 13. │         │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
 14. │         │   └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 15. │         │     └─base (local) doTryCatch(return(expr), name, parentenv, handler)
 16. │         └─vctrs::vec_as_location(i, n, names = names, arg = arg, call = call)
 17. └─vctrs (local) `<fn>`()
 18.   └─vctrs:::stop_subscript_oob(...)
 19.     └─vctrs:::stop_subscript(...)
 20.       └─rlang::abort(...)

2,173 entries, 41 total columns for annotated kegg
39 entries, 19 columns metadata

image image

sessionInfo:

R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ALDEx2_1.28.0         zCompositions_1.5.0-4 truncnorm_1.0-9       NADA_1.6-1.1          survival_3.7-0       
 [6] MASS_7.3-60.0.1       patchwork_1.3.0       ggprism_1.0.5         lubridate_1.9.3       forcats_1.0.0        
[11] stringr_1.5.1         dplyr_1.1.4           purrr_1.0.2           tidyr_1.3.1           tidyverse_2.0.0      
[16] tibble_3.2.1          readr_2.1.5           ggpicrust2_1.7.3      ggthemes_5.1.0        ggplot2_3.5.1        

loaded via a namespace (and not attached):
  [1] splines_4.3.2               later_1.3.2                 bitops_1.0-8                lifecycle_1.0.4            
  [5] edgeR_4.0.16                doParallel_1.0.17           vroom_1.6.5                 lattice_0.22-6             
  [9] magrittr_2.0.3              limma_3.58.1                remotes_2.5.0               httpuv_1.6.15              
 [13] Wrench_1.20.0               sessioninfo_1.2.2           pkgbuild_1.4.4              metagenomeSeq_1.43.0       
 [17] DBI_1.2.3                   RColorBrewer_1.1-3          ade4_1.7-22                 multcomp_1.4-26            
 [21] abind_1.4-8                 pkgload_1.4.0               zlibbioc_1.48.2             quadprog_1.5-8             
 [25] GenomicRanges_1.54.1        BiocGenerics_0.48.1         RCurl_1.98-1.16             TH.data_1.1-2              
 [29] phyloseq_1.48.0             sandwich_3.1-1              circlize_0.4.16             GenomeInfoDbData_1.2.11    
 [33] IRanges_2.36.0              S4Vectors_0.40.2            vegan_2.6-8                 permute_0.9-7              
 [37] codetools_0.2-20            getopt_1.20.4               coin_1.4-3                  DelayedArray_0.28.0        
 [41] tidyselect_1.2.1            shape_1.4.6.1               farver_2.1.2                matrixStats_1.4.1          
 [45] stats4_4.3.2                jsonlite_1.8.8              GetoptLong_1.0.5            multtest_2.58.0            
 [49] ellipsis_0.3.2              iterators_1.0.14            foreach_1.5.2               tools_4.3.2                
 [53] Rcpp_1.0.13                 glue_1.7.0                  SparseArray_1.2.4           DESeq2_1.42.1              
 [57] mgcv_1.9-1                  MatrixGenerics_1.14.0       usethis_3.0.0               GenomeInfoDb_1.38.8        
 [61] withr_3.0.1                 BiocManager_1.30.25         fastmap_1.2.0               GGally_2.2.1               
 [65] latticeExtra_0.6-30         rhdf5filters_1.14.1         fansi_1.0.6                 Maaslin2_1.16.0            
 [69] caTools_1.18.3              digest_0.6.37               timechange_0.3.0            R6_2.5.1                   
 [73] mime_0.12                   colorspace_2.1-1            gtools_3.9.5                jpeg_0.1-10                
 [77] utf8_1.2.4                  generics_0.1.3              data.table_1.16.0           robustbase_0.99-4          
 [81] httr_1.4.7                  htmlwidgets_1.6.4           S4Arrays_1.2.1              ggstats_0.6.0              
 [85] pkgconfig_2.0.3             gtable_0.3.5                modeltools_0.2-23           ComplexHeatmap_2.18.0      
 [89] XVector_0.42.0              pcaPP_2.0-5                 htmltools_0.5.8.1           profvis_0.3.8              
 [93] biomformat_1.30.0           clue_0.3-65                 scales_1.3.0                Biobase_2.62.0             
 [97] png_0.1-8                   optparse_1.7.5              rstudioapi_0.16.0           tzdb_0.4.0                 
[101] reshape2_1.4.4              rjson_0.2.23                curl_5.2.2                  nlme_3.1-166               
[105] zoo_1.8-12                  cachem_1.1.0                rhdf5_2.46.1                GlobalOptions_0.1.2        
[109] KernSmooth_2.23-24          parallel_4.3.2              miniUI_0.1.1.1              libcoin_1.0-10             
[113] RcppZiggurat_0.1.6          pillar_1.9.0                grid_4.3.2                  vctrs_0.6.5                
[117] gplots_3.1.3.1              urlchecker_1.0.1            promises_1.3.0              xtable_1.8-4               
[121] cluster_2.1.6               mvtnorm_1.3-1               cli_3.6.3                   locfit_1.5-9.10            
[125] compiler_4.3.2              rlang_1.1.4                 crayon_1.5.3                lefser_1.12.1              
[129] labeling_0.4.3              interp_1.1-6                plyr_1.8.9                  fs_1.6.4                   
[133] stringi_1.8.4               deldir_2.0-4                BiocParallel_1.36.0         munsell_0.5.1              
[137] Biostrings_2.70.3           devtools_2.4.5              glmnet_4.1-8                Matrix_1.6-5               
[141] hms_1.1.3                   bit64_4.0.5                 Rhdf5lib_1.24.2             KEGGREST_1.42.0            
[145] statmod_1.5.0               shiny_1.9.1                 SummarizedExperiment_1.32.0 Rfast_2.1.0                
[149] igraph_2.0.3                memoise_2.0.1               RcppParallel_5.1.9          biglm_0.9-3                
[153] bit_4.0.5                   DEoptimR_1.1-3              directlabels_2024.1.21      ape_5.8 

@cafferychen777
Copy link
Owner

Dear l.gallucci,

Thank you for reporting this issue with the pathway_heatmap function in the ggpicrust2 package. I believe I understand the problem now:

The column names in your abundance_file don't match the sample IDs in your metadata file. Specifically:

  1. Your metadata file has sample IDs like "sample_id", "Ex2", "Ex4", "Ex_6", "Ex_7", etc.
  2. Your abundance_file has column names like "1", "10", "11", "12", "13", "15", "16", "17", etc.

This mismatch is likely causing the error you're seeing. To resolve this, you need to modify the column names in your abundance_file to match the sample IDs in your metadata file.

Here's a suggested solution:

  1. First, check your metadata file to confirm the exact sample IDs.
  2. Then, modify your abundance_file column names to match these sample IDs.

You can do this using the colnames() function in R. Here's an example of how you might do this:

# Assuming your abundance_file is loaded into a dataframe called 'abundance_df'
# and your metadata is loaded into a dataframe called 'metadata_df'

# Get the sample IDs from your metadata
sample_ids <- metadata_df$sample_id  # or whatever column contains your sample IDs

# Make sure the number of samples matches
if(length(sample_ids) == ncol(abundance_df) - 1) {  # -1 because the first column is likely feature IDs
  # Set the column names of abundance_df
  colnames(abundance_df)[-1] <- sample_ids
} else {
  stop("The number of samples in metadata doesn't match the number of columns in abundance file")
}

After making this change, try running your original code again:

annotated_kegg <- pathway_annotation(file = abundance_df, pathway = "KO", ko_to_kegg = TRUE)
heat <- pathway_heatmap(annotated_kegg, metadata_df, "Type")

If you're still encountering issues after making these changes, please let me know and provide:

  1. The first few lines of your abundance_file and metadata file (after making the changes).
  2. The dimensions of your annotated_kegg and metadata_df dataframes.
  3. Any error messages you're still seeing.

This should help resolve the "Can't extract columns past the end" error you were experiencing. Let me know if you need any further assistance!

Best regards,
Chen Yang

@luigallucci
Copy link
Author

luigallucci commented Sep 18, 2024

Dear @cafferychen777 , thank you for the reply.

This is what I performed. Sorry I forgot to specify that I'm using dada_id as names for sampleID.

Unlikely, even changing this the result is still the same.

Apparently, the problems seems to be related to this:

metadata %>% select(all_of(c(sample_name_col))) %>% pull()

@cafferychen777
Copy link
Owner

Hi @luigallucci ,

Could you sent the data file to [email protected]?

Best,

@Niyuh04
Copy link

Niyuh04 commented Nov 13, 2024

I have the same problem. I made sure that the row names in the sample.id column of my metadata and the column names in the abundance file match, but I’m still getting the same error. Sorry if something isn’t clear; I’m not very fluent in English, and I’m a bioinformatics enthusiast. Thanks, and great work!

Backtrace:

  1. ├─ggpicrust2::pathway_heatmap(...)
  2. │ └─metadata %>% select(all_of(c(sample_name_col))) %>% pull()

imagen

imagen

@cafferychen777
Copy link
Owner

Hi @l.gallucci and @Niyuh04,

Thank you for reporting this issue. Based on the error messages and screenshots you've provided, I can help resolve the sample name matching problem in the pathway_heatmap function.

The error occurs because the function cannot find matching sample names between your abundance data and metadata. Here's how to fix it:

  1. First, please check that your sample names match exactly between your abundance data and metadata:
# Check your data
head(colnames(annotated_kegg))  # These should match your metadata sample IDs
head(metadata$sample_id)        # Or whatever column contains your sample IDs
  1. Make sure your metadata has one of these column names for sample IDs:
  • sample_id
  • SampleID
  • Sample_ID
  • sample_name
  • Sample
  • dada_id
  1. Example of correct format:
# Metadata format
metadata <- data.frame(
  sample_id = c("sample1", "sample2", "sample3"),  # Must match abundance colnames
  Type = c("control", "treatment", "treatment")
)

# Then call the function
pathway_heatmap(
  abundance = annotated_kegg,
  metadata = metadata,
  group = "Type"
)

If you're still experiencing issues, please share:

  1. The output of head(colnames(annotated_kegg))
  2. The output of head(metadata)
  3. The exact column name in your metadata that contains sample IDs

I'll be implementing a fix in the next package update to make the sample name matching more robust and provide clearer error messages.

Best regards,
Caffery

P.S. @Niyuh04 - Your English is perfectly clear, no worries! Thank you for providing the detailed error information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants