Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[infra] [dados] bdplyr e bd_collect não baixam datasets #1483

Closed
gustavoalcantara opened this issue Dec 12, 2022 · 9 comments
Closed

[infra] [dados] bdplyr e bd_collect não baixam datasets #1483

gustavoalcantara opened this issue Dec 12, 2022 · 9 comments
Assignees
Labels
bug Something isn't working R

Comments

@gustavoalcantara
Copy link
Contributor

gustavoalcantara commented Dec 12, 2022

Descrição
os comandos bdplyr, quando utilizado da seguinte forma
query <- bdplyr("br_ibge_populacao.municipio") e posteriormente
bd_collect(query)

não estão trazendo os datasets. Apenas as colunas sem nenhuma observação.

Selecionando uma ou mais colunas, ele traz tranquilo:
query <- bdplyr('br_ibge_populacao.municipio') %>% dplyr::select(id_municipio) %>% dplyr::glimpse()

Estes testes fazem parte da atribuição do limit no pacote na página do site, igual o que acontece em SQL. É mais uma boa prática de dados para com os usuários...
Parte deles vêm relatando que seguindo a doc. que encontra-se no site da BD, estão baixando todo o df. Fizeram isso pra RAIS na semana passada...

Seria bacana chegarmos em um padrão de query para os dataframes da seguinte maneira:
query <- bdplyr('br_ibge_populacao.municipio') %>% head(100)

Estou usando o Rcloud para fazer esses testes e o @arthurfg tava utilizando o pacote em IOS.

@arthurfg
Copy link
Collaborator

Exemplo reprodutível com o reprex e o dbplyr == 2.1.1:

Retorna erro noUseMethod("op_can_download")

library(basedosdados)
#> 
#> ── Base dos Dados ──────────────────────────────────────────────────────────────
#> • Website: https://basedosdados.org/
#> • Docs: https://basedosdados.github.io/mais/
#> 
#> ── Contatos ──
#> 
#> • Discord: discord.gg/tuaFbAPeq5
#> • Twitter: @basedosdados
#> • Email: [email protected]
#> • Telegram: t.me/joinchat/OKWc3RnClXnq2hq-8o0h_w
#> • Github: github.com/basedosdados
#> • LinkedIn: linkedin.com/company/base-dos-dados
#> • Newsletter:
#> https://basedosdados.hubspotpagebuilder.com/assine-a-newsletter-da-base-dos-dados
#> • WhatsApp: https://chat.whatsapp.com/HXWgdFc1RmwCoblly5KPBZ
#> • YouTube: https://www.youtube.com/c/BasedosDados
#> 
#> ── Somos um projeto open-source e gratuito. Para nos mantermos e crescermos precisamos do seu apoio. Quanto valem as horas do seu trabalho que poupamos?
#> 
#> Nos apoie em https://apoia.se/basedosdados.
#> 
#> ── Usando o pacote para produzir um trabalho? Para nos citar basta rodar citation('basedosdados').

set_billing_id("casebd")
#> Project keys set successfully

query <- bdplyr("br_ibge_populacao.uf")
#> ℹ Suitable tokens found in the cache, associated with these emails:
#> • '[email protected]'
#> • '[email protected]'
#>   Defaulting to the first email.
#> ! Using an auto-discovered, cached token.
#>   To suppress this message, modify your code or options to clearly consent to
#>   the use of a cached token.
#>   See gargle's "Non-interactive auth" vignette for more details:
#>   <https://gargle.r-lib.org/articles/non-interactive-auth.html>
#> ℹ The bigrquery package is using a cached token for
#>   '[email protected]'.
#> Successfully connected to table `basedosdados.br_ibge_populacao.uf`.
df <- bd_collect(query, show_query = TRUE)  
#> The following query will be executed:
#> <SQL>
#> SELECT `sigla_uf`, `ano`, `populacao`, `populacao_economicamente_ativa`
#> FROM `basedosdados.br_ibge_populacao.uf`
#> Error in UseMethod("op_can_download"): método não aplicável para 'op_can_download' aplicado a um objeto de classe "NULL"

Created on 2022-12-12 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       macOS Big Sur ... 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  pt_BR.UTF-8
#>  ctype    pt_BR.UTF-8
#>  tz       America/Sao_Paulo
#>  date     2022-12-12
#>  pandoc   2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date (UTC) lib source
#>  askpass        1.1        2019-01-13 [1] CRAN (R 4.2.0)
#>  assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  basedosdados * 0.2.1      2022-03-15 [1] CRAN (R 4.2.0)
#>  bigrquery      1.4.0.9000 2022-10-05 [1] Github (r-dbi/bigrquery@73781ba)
#>  bit            4.0.4      2020-08-04 [1] CRAN (R 4.2.0)
#>  bit64          4.0.5      2020-08-30 [1] CRAN (R 4.2.0)
#>  cli            3.4.1      2022-09-23 [1] CRAN (R 4.2.0)
#>  colorspace     2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>  curl           4.3.2      2021-06-23 [1] CRAN (R 4.2.0)
#>  DBI            1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dbplyr         2.1.1      2021-04-06 [1] CRAN (R 4.2.1)
#>  digest         0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  dplyr          1.0.10     2022-09-01 [1] CRAN (R 4.2.0)
#>  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate       0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi          1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap        1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  fs             1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  gargle         1.2.1      2022-09-08 [1] CRAN (R 4.2.0)
#>  generics       0.1.3      2022-07-05 [1] CRAN (R 4.2.0)
#>  glue           1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  highr          0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  hms            1.1.2      2022-08-19 [1] CRAN (R 4.2.0)
#>  htmltools      0.5.2      2021-08-25 [1] CRAN (R 4.2.0)
#>  httr           1.4.4      2022-08-17 [1] CRAN (R 4.2.0)
#>  jsonlite       1.8.2      2022-10-02 [1] CRAN (R 4.2.0)
#>  knitr          1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lifecycle      1.0.3      2022-10-07 [1] CRAN (R 4.2.0)
#>  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  munsell        0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  openssl        2.0.3      2022-09-14 [1] CRAN (R 4.2.0)
#>  pillar         1.8.1      2022-08-19 [1] CRAN (R 4.2.0)
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr          0.3.5      2022-10-06 [1] CRAN (R 4.2.1)
#>  R.cache        0.16.0     2022-07-21 [1] CRAN (R 4.2.0)
#>  R.methodsS3    1.8.2      2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo           1.25.0     2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils        2.12.0     2022-06-28 [1] CRAN (R 4.2.0)
#>  R6             2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  rappdirs       0.3.3      2021-01-31 [1] CRAN (R 4.2.0)
#>  Rcpp           1.0.9      2022-07-08 [1] CRAN (R 4.2.0)
#>  readr          2.1.2      2022-01-30 [1] CRAN (R 4.2.0)
#>  reprex         2.0.1      2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang          1.0.6      2022-09-24 [1] CRAN (R 4.2.0)
#>  rmarkdown      2.14       2022-04-25 [1] CRAN (R 4.2.0)
#>  rstudioapi     0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  scales         1.2.0      2022-04-13 [1] CRAN (R 4.2.0)
#>  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi        1.7.8      2022-07-11 [1] CRAN (R 4.2.0)
#>  stringr        1.4.0      2019-02-10 [1] CRAN (R 4.2.0)
#>  styler         1.7.0      2022-03-13 [1] CRAN (R 4.2.0)
#>  tibble         3.1.8      2022-07-22 [1] CRAN (R 4.2.0)
#>  tidyselect     1.2.0      2022-10-10 [1] CRAN (R 4.2.0)
#>  typed          0.0.1      2021-03-19 [1] CRAN (R 4.2.0)
#>  tzdb           0.3.0      2022-03-28 [1] CRAN (R 4.2.0)
#>  utf8           1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs          0.5.1      2022-11-16 [1] CRAN (R 4.2.1)
#>  withr          2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun           0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml           2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Exemplo reprodutível com o reprex e o dbplyr == 2.2.1:

Retorna uma tibble vazia, só com as colunas.

library(basedosdados)
#> 
#> ── Base dos Dados ──────────────────────────────────────────────────────────────
#> • Website: https://basedosdados.org/
#> • Docs: https://basedosdados.github.io/mais/
#> 
#> ── Contatos ──
#> 
#> • Discord: discord.gg/tuaFbAPeq5
#> • Twitter: @basedosdados
#> • Email: [email protected]
#> • Telegram: t.me/joinchat/OKWc3RnClXnq2hq-8o0h_w
#> • Github: github.com/basedosdados
#> • LinkedIn: linkedin.com/company/base-dos-dados
#> • Newsletter:
#> https://basedosdados.hubspotpagebuilder.com/assine-a-newsletter-da-base-dos-dados
#> • WhatsApp: https://chat.whatsapp.com/HXWgdFc1RmwCoblly5KPBZ
#> • YouTube: https://www.youtube.com/c/BasedosDados
#> 
#> ── Somos um projeto open-source e gratuito. Para nos mantermos e crescermos precisamos do seu apoio. Quanto valem as horas do seu trabalho que poupamos?
#> 
#> Nos apoie em https://apoia.se/basedosdados.
#> 
#> ── Usando o pacote para produzir um trabalho? Para nos citar basta rodar citation('basedosdados').

set_billing_id("casebd")
#> Project keys set successfully

query <- bdplyr("br_ibge_populacao.uf")
#> ℹ Suitable tokens found in the cache, associated with these emails:
#> • '[email protected]'
#> • '[email protected]'
#>   Defaulting to the first email.
#> ! Using an auto-discovered, cached token.
#>   To suppress this message, modify your code or options to clearly consent to
#>   the use of a cached token.
#>   See gargle's "Non-interactive auth" vignette for more details:
#>   <https://gargle.r-lib.org/articles/non-interactive-auth.html>
#> ℹ The bigrquery package is using a cached token for
#>   '[email protected]'.
#> Warning: <BigQueryConnection> uses an old dbplyr interface
#> ℹ Please install a newer version of the package or contact the maintainer
#> This warning is displayed once every 8 hours.
#> Successfully connected to table `basedosdados.br_ibge_populacao.uf`.
df <- bd_collect(query, show_query = TRUE)  
#> The following query will be executed:
#> <SQL>
#> SELECT *
#> FROM `basedosdados.br_ibge_populacao.uf`

df
#> # A tibble: 0 × 4
#> # … with 4 variables: sigla_uf <chr>, ano <int64>, populacao <int64>,
#> #   populacao_economicamente_ativa <int64>

Created on 2022-12-12 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       macOS Big Sur ... 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  pt_BR.UTF-8
#>  ctype    pt_BR.UTF-8
#>  tz       America/Sao_Paulo
#>  date     2022-12-12
#>  pandoc   2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date (UTC) lib source
#>  askpass        1.1        2019-01-13 [1] CRAN (R 4.2.0)
#>  assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  basedosdados * 0.2.1      2022-03-15 [1] CRAN (R 4.2.0)
#>  bigrquery      1.4.0.9000 2022-10-05 [1] Github (r-dbi/bigrquery@73781ba)
#>  bit            4.0.4      2020-08-04 [1] CRAN (R 4.2.0)
#>  bit64          4.0.5      2020-08-30 [1] CRAN (R 4.2.0)
#>  cli            3.4.1      2022-09-23 [1] CRAN (R 4.2.0)
#>  colorspace     2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>  curl           4.3.2      2021-06-23 [1] CRAN (R 4.2.0)
#>  DBI            1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dbplyr         2.2.1      2022-06-27 [1] CRAN (R 4.2.0)
#>  digest         0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  dplyr          1.0.10     2022-09-01 [1] CRAN (R 4.2.0)
#>  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate       0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi          1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap        1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  fs             1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  gargle         1.2.1      2022-09-08 [1] CRAN (R 4.2.0)
#>  generics       0.1.3      2022-07-05 [1] CRAN (R 4.2.0)
#>  glue           1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  highr          0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  hms            1.1.2      2022-08-19 [1] CRAN (R 4.2.0)
#>  htmltools      0.5.2      2021-08-25 [1] CRAN (R 4.2.0)
#>  httr           1.4.4      2022-08-17 [1] CRAN (R 4.2.0)
#>  jsonlite       1.8.2      2022-10-02 [1] CRAN (R 4.2.0)
#>  knitr          1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lifecycle      1.0.3      2022-10-07 [1] CRAN (R 4.2.0)
#>  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  munsell        0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  openssl        2.0.3      2022-09-14 [1] CRAN (R 4.2.0)
#>  pillar         1.8.1      2022-08-19 [1] CRAN (R 4.2.0)
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr          0.3.5      2022-10-06 [1] CRAN (R 4.2.1)
#>  R.cache        0.16.0     2022-07-21 [1] CRAN (R 4.2.0)
#>  R.methodsS3    1.8.2      2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo           1.25.0     2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils        2.12.0     2022-06-28 [1] CRAN (R 4.2.0)
#>  R6             2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  rappdirs       0.3.3      2021-01-31 [1] CRAN (R 4.2.0)
#>  Rcpp           1.0.9      2022-07-08 [1] CRAN (R 4.2.0)
#>  readr          2.1.2      2022-01-30 [1] CRAN (R 4.2.0)
#>  reprex         2.0.1      2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang          1.0.6      2022-09-24 [1] CRAN (R 4.2.0)
#>  rmarkdown      2.14       2022-04-25 [1] CRAN (R 4.2.0)
#>  rstudioapi     0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  scales         1.2.0      2022-04-13 [1] CRAN (R 4.2.0)
#>  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi        1.7.8      2022-07-11 [1] CRAN (R 4.2.0)
#>  stringr        1.4.0      2019-02-10 [1] CRAN (R 4.2.0)
#>  styler         1.7.0      2022-03-13 [1] CRAN (R 4.2.0)
#>  tibble         3.1.8      2022-07-22 [1] CRAN (R 4.2.0)
#>  tidyselect     1.2.0      2022-10-10 [1] CRAN (R 4.2.0)
#>  typed          0.0.1      2021-03-19 [1] CRAN (R 4.2.0)
#>  tzdb           0.3.0      2022-03-28 [1] CRAN (R 4.2.0)
#>  utf8           1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs          0.5.1      2022-11-16 [1] CRAN (R 4.2.1)
#>  withr          2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun           0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml           2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@arthurfg
Copy link
Collaborator

Com o dbplyr == 2.2.1, sem nenhum filtro ou select, retorna uma tibble vazia. Se aplicar algum filter() ou select(), o bd_collect() retorna a tibble esperada.

Aplicando um head() após o bdplyr(), retorna um erro no str_split():

query <- basedosdados::bdplyr("br_ibge_populacao.uf") |>
  head()
#> 
#> ── Base dos Dados ──────────────────────────────────────────────────────────────
#> • Website: https://basedosdados.org/
#> • Docs: https://basedosdados.github.io/mais/
#> 
#> ── Contatos ──
#> 
#> • Discord: discord.gg/tuaFbAPeq5
#> • Twitter: @basedosdados
#> • Email: [email protected]
#> • Telegram: t.me/joinchat/OKWc3RnClXnq2hq-8o0h_w
#> • Github: github.com/basedosdados
#> • LinkedIn: linkedin.com/company/base-dos-dados
#> • Newsletter:
#> https://basedosdados.hubspotpagebuilder.com/assine-a-newsletter-da-base-dos-dados
#> • WhatsApp: https://chat.whatsapp.com/HXWgdFc1RmwCoblly5KPBZ
#> • YouTube: https://www.youtube.com/c/BasedosDados
#> 
#> ── Somos um projeto open-source e gratuito. Para nos mantermos e crescermos precisamos do seu apoio. Quanto valem as horas do seu trabalho que poupamos?
#> 
#> Nos apoie em https://apoia.se/basedosdados.
#> 
#> ── Usando o pacote para produzir um trabalho? Para nos citar basta rodar citation('basedosdados').
#> Fecthing Billing Project Id from enviroment variables defined by user.
#> ℹ Suitable tokens found in the cache, associated with these emails:
#> 
#> • '[email protected]'
#> 
#> • '[email protected]'
#> 
#>   Defaulting to the first email.
#> 
#> ! Using an auto-discovered, cached token.
#> 
#>   To suppress this message, modify your code or options to clearly consent to
#>   the use of a cached token.
#> 
#>   See gargle's "Non-interactive auth" vignette for more details:
#> 
#>   <https://gargle.r-lib.org/articles/non-interactive-auth.html>
#> 
#> ℹ The bigrquery package is using a cached token for
#>   '[email protected]'.
#> Warning: <BigQueryConnection> uses an old dbplyr interface
#> ℹ Please install a newer version of the package or contact the maintainer
#> This warning is displayed once every 8 hours.
#> Successfully connected to table `basedosdados.br_ibge_populacao.uf`.

df <- basedosdados::bd_collect(query, show_query = TRUE)  
#> Fecthing Billing Project Id from enviroment variables defined by user.
#> The following query will be executed:
#> <SQL>
#> SELECT *
#> FROM `basedosdados.br_ibge_populacao.uf`
#> LIMIT 6
#> Error in strsplit(name, ".", fixed = TRUE): argumento modo não caractere

Created on 2022-12-12 by the reprex package (v2.0.1)

@pedrocava
Copy link
Collaborator

Essa issue tem no mínimo dois erros diferentes, então seria bom se fosse fechado e cada um apresentado em uma issue individual. Dito isso, a descrição do problema está bem complicada - eu não entendi a maior parte :( - e faz muita falta uma chamada de sessioninfo::sessioninfo() para entender melhor como está o ambiente de vocês.

@arthurfg
Copy link
Collaborator

Beleza, vamos fazer isso. No caso, acabei percebendo que o erro retornado no código que enviei é o mesmo da issue #1470. Acho que deixar como comentário lá, o que acha? Eu exportei o reprex com a sessioInfo(), está no primeiro comentário. A session do segundo é a mesma, ai não coloquei novamente para não poluir a issue.

@pedrocava
Copy link
Collaborator

Conferi aqui e você tá com a versão de desenvolvimento do bigrquery. O fix do dbplyr na versão 2.1.1 funciona porque uma versão recente do bigrquery não funcionava com o dbplyr na versão 2.2.1, então acho que rolou uma confusão aí.

Tá dando muito problema isso, vou tentar encontrar a combinação estável de versões e subir uma versão nova do basedosdados até o Natal.

@arthurfg
Copy link
Collaborator

Hmmm pode crer.
Beleza, quando for fazer isso, pode me chamar? Queria ver vc mexendo o pacote.

@pedrocava
Copy link
Collaborator

Antes de mexer eu queria puxar um horário contigo e o @gustavoalcantara para estabelecer algumas coisas sobre como as peças se encaixam. Conseguem achar um horário aqui?

@arthurfg
Copy link
Collaborator

Agendamos lá!

@pedrocava pedrocava changed the title [infra] [dados] <bdplyr e bd_collect não baixam datasets [infra] [dados] bdplyr e bd_collect não baixam datasets Dec 18, 2022
@laura-l-amaral
Copy link
Contributor

#1563

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working R
Projects
None yet
Development

No branches or pull requests

4 participants