Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getBM() attribute header/value mismatch #108

Open
karlmakepeace opened this issue Aug 13, 2024 · 0 comments
Open

getBM() attribute header/value mismatch #108

karlmakepeace opened this issue Aug 13, 2024 · 0 comments

Comments

@karlmakepeace
Copy link

I am trying to access various Ensembl "sequences" page attributes and noticed that the values returned do not match with the appropriate attribute. For example, the gene_sequences_attributes_subset column "3utr" is filled with "TP53" (which should be in the "external_gene_name" attribute). Likewise, "external_gene_name" appears to be filled with "ensembl_gene_id" values (which were also not actually requested in the attributes argument of biomaRt::getBM()).

Similarly, the attribute headers/values in gene_sequences_attributes_all appear scrambled also.

# {biomaRt} bug attributes header/value mismatch examples #---------------------
# install.packages("tibble")

mart <- biomaRt::useEnsembl(
  biomart = "genes",
  version = "112", # latest as of 2024-08-13
  dataset = "hsapiens_gene_ensembl")

attributes <- biomaRt::listAttributes(
  mart = mart,
  page = "sequences",
  what = "name")

gene_sequences_attributes_all <- biomaRt::getBM(
  mart = mart,
  attributes = attributes,
  filters = c("external_gene_name"),
  values = list(c("TP53")))

gene_sequences_attributes_subset <- biomaRt::getBM(
  mart = mart,
  attributes = c("external_gene_name", "5utr","3utr"),
  filters = c("external_gene_name"),
  values = list(c("TP53")))

# Inspect in console #----------------------------------------------------------
gene_sequences_attributes_all |> tibble::as_tibble()
# # A tibble: 30 × 60
#    transcript_exon_intron gene_exon_intron transcript_flank   gene_flank      
#    <chr>                  <chr>            <chr>              <chr>           
#  1 Sequence unavailable   ENSG00000141510  ENSG00000141510.19 tumor protein p…
#  2 Sequence unavailable   ENSG00000141510  ENSG00000141510.19 tumor protein p…
#  3 Sequence unavailable   ENSG00000141510  ENSG00000141510.19 tumor protein p…
#  4 Sequence unavailable   ENSG00000141510  ENSG00000141510.19 tumor protein p…
#  5 Sequence unavailable   ENSG00000141510  ENSG00000141510.19 tumor protein p…
#  6 Sequence unavailable   ENSG00000141510  ENSG00000141510.19 tumor protein p…
#  7 Sequence unavailable   ENSG00000141510  ENSG00000141510.19 tumor protein p…
#  8 Sequence unavailable   ENSG00000141510  ENSG00000141510.19 tumor protein p…
#  9 Sequence unavailable   ENSG00000141510  ENSG00000141510.19 tumor protein p…
# 10 Sequence unavailable   ENSG00000141510  ENSG00000141510.19 tumor protein p…
# # ℹ 20 more rows
# # ℹ 56 more variables: coding_transcript_flank <chr>,
# #   coding_gene_flank <chr>, `5utr` <chr>, `3utr` <chr>, gene_exon <chr>,
# #   cdna <chr>, coding <chr>, peptide <chr>, upstream_flank <chr>,
# #   downstream_flank <chr>, ensembl_gene_id <chr>,
# #   ensembl_gene_id_version <chr>, description <chr>,
# #   external_gene_name <chr>, external_gene_source <chr>, …
# # ℹ Use `print(n = ...)` to see more rows

gene_sequences_attributes_subset |> tibble::as_tibble()
# # A tibble: 20 × 3
#    `3utr` external_gene_name `5utr`                                           
#    <chr>  <chr>              <chr>                                            
#  1 TP53   ENSG00000141510    CCCCATGTTCCTGGCTAGCCAAGGAACCACCAGTTGATTAGCAGAGAA…
#  2 TP53   ENSG00000141510    GGCGCTAAAAGTTTTGAGCTTCTCAAAAGTCTAGAGCCACCGTCCAGG…
#  3 TP53   ENSG00000141510    CTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGGGGACAC…
#  4 TP53   ENSG00000141510    TGAGGCCAGGAGATGGAGGCTGCAGTGAGCTGTGATCACACCACTGTG…
#  5 TP53   ENSG00000141510    Sequence unavailable                             
#  6 TP53   ENSG00000141510    AAAAGTCTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGG…
#  7 TP53   ENSG00000141510    TGAGGCCAGGAGATGGAGGCTGCAGTGAGCTGTGATCACACCACTGTG…
#  8 TP53   ENSG00000141510    CTCAAAAGTCTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTC…
#  9 TP53   ENSG00000141510    AAAAGTCTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGG…
# 10 TP53   ENSG00000141510    AAAAGTCTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGG…
# 11 TP53   ENSG00000141510    CTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGGGGACAC…
# 12 TP53   ENSG00000141510    CTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGGGGACAC…
# 13 TP53   ENSG00000141510    TTCGGGCTGGGAGCGTGCTTTCCACGACGGTGACACGCTTCCCTGGAT…
# 14 TP53   ENSG00000141510    TTCGGGCTGGGAGCGTGCTTTCCACGACGGTGACACGCTTCCCTGGAT…
# 15 TP53   ENSG00000141510    TCTCAAAAGTCTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCT…
# 16 TP53   ENSG00000141510    GTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTTTTGAGCTT…
# 17 TP53   ENSG00000141510    TTTGTAATGCAGGGCTGAGGAGTGTCCGAAGAGAATGGGCAGCAGCCA…
# 18 TP53   ENSG00000141510    GGATTGGGGTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTT…
# 19 TP53   ENSG00000141510    AAAAGTCTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGG…
# 20 TP53   ENSG00000141510    CTAGAGCTTTTGGGGAAGAGGGAGTGGTTGTTAAGAGATGAGATTAAA…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant