Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float precision in relationship table causes issues in cohort diagnostics unit tests #65

Open
azimov opened this issue Aug 28, 2024 · 1 comment

Comments

@azimov
Copy link

azimov commented Aug 28, 2024

Description

When inserting relationships to postgres tables in cohort diagnostics I get an error that the 'is_hierarchical' and 'defines_ancestry' columns overflow (note they are defined as varchar(1) here but they are defined as unbounded TEXT in the sqlite common data model ddl.

When checking in sqlite the values all appear to be '0.0' - this causes cohort diagnostics unit tests to fail because its taking the 0.0 values from sqlite and trying to insert them in to postgres:

https://github.com/OHDSI/CohortDiagnostics/actions/runs/10579576275/job/29312378749#step:8:5416

I think this is maybe a bug I have seen before being caused by the sqlite DBI driver creeping up in DatabaseConnector:

OHDSI/DatabaseConnector#280

Essentially the value is thrown in to a csv as a 0, then when it is loaded it is turned into a numeric. When the DBI sqlite driver sees a numeric it automatically adds the floating point precision. This value is then finding its way into the CohortDiagnostics export (which I can work around).

@ablack3
Copy link
Collaborator

ablack3 commented Sep 18, 2024

Here is a reprex for this issue. "0.0" is getting stored in the text column where "0" should be. SQLite does not have a varchar(1) datatype as far as I know.

remotes::install_github("ohdsi/Eunomia")
#> Using github PAT from envvar GITHUB_PAT. Use `gitcreds::gitcreds_set()` and unset GITHUB_PAT in .Renviron (or elsewhere) if you want to use the more secure git credential store instead.
#> Skipping install of 'Eunomia' from a github remote, the SHA1 (79c89443) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(Eunomia)
library(DatabaseConnector)
cd <- createConnectionDetails("sqlite", server = getDatabaseFile("GiBleed", overwrite = T))
con <- connect(cd)
#> attempting to download GiBleed
#> attempting to extract and load: /Users/ablack/eunomia_data/GiBleed_5.3.zip to: /Users/ablack/eunomia_data/GiBleed_5.3.sqlite
#> Connecting using SQLite driver
#> attempting to download GiBleed
#> 
#> attempting to extract and load: /Users/ablack/eunomia_data/GiBleed_5.3.zip to: /Users/ablack/eunomia_data/GiBleed_5.3.sqlite
querySql(con, "select is_hierarchical from main.relationship") |> dplyr::tibble()
#> # A tibble: 480 × 1
#>    IS_HIERARCHICAL
#>    <chr>          
#>  1 0.0            
#>  2 0.0            
#>  3 0.0            
#>  4 0.0            
#>  5 0.0            
#>  6 0.0            
#>  7 0.0            
#>  8 0.0            
#>  9 0.0            
#> 10 0.0            
#> # ℹ 470 more rows
disconnect(con)

Created on 2024-09-18 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.3 (2024-02-29)
#>  os       macOS Sonoma 14.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2024-09-18
#>  pandoc   3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package           * version date (UTC) lib source
#>  backports           1.5.0   2024-05-23 [1] CRAN (R 4.3.3)
#>  bit                 4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
#>  bit64               4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
#>  blob                1.2.4   2023-03-17 [1] CRAN (R 4.3.0)
#>  cachem              1.1.0   2024-05-16 [1] CRAN (R 4.3.3)
#>  checkmate           2.3.2   2024-07-29 [1] CRAN (R 4.3.3)
#>  cli                 3.6.3   2024-06-21 [1] CRAN (R 4.3.3)
#>  CommonDataModel     0.2.0   2024-02-07 [1] CRAN (R 4.3.1)
#>  crayon              1.5.3   2024-06-20 [1] CRAN (R 4.3.3)
#>  curl                5.2.2   2024-08-26 [1] CRAN (R 4.3.3)
#>  DatabaseConnector * 6.3.2   2023-12-11 [1] CRAN (R 4.3.1)
#>  DBI                 1.2.3   2024-06-02 [1] CRAN (R 4.3.3)
#>  digest              0.6.37  2024-08-19 [1] CRAN (R 4.3.3)
#>  dplyr               1.1.4   2023-11-17 [1] CRAN (R 4.3.1)
#>  Eunomia           * 2.0.0   2024-09-18 [1] Github (ohdsi/Eunomia@79c8944)
#>  evaluate            0.24.0  2024-06-10 [1] CRAN (R 4.3.3)
#>  fansi               1.0.6   2023-12-08 [1] CRAN (R 4.3.1)
#>  fastmap             1.2.0   2024-05-15 [1] CRAN (R 4.3.3)
#>  fs                  1.6.4   2024-04-25 [1] CRAN (R 4.3.1)
#>  generics            0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
#>  glue                1.7.0   2024-01-09 [1] CRAN (R 4.3.1)
#>  hms                 1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
#>  htmltools           0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
#>  knitr               1.48    2024-07-07 [1] CRAN (R 4.3.3)
#>  lifecycle           1.0.4   2023-11-07 [1] CRAN (R 4.3.1)
#>  magrittr            2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
#>  memoise             2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
#>  pillar              1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig           2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
#>  R6                  2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
#>  readr               2.1.5   2024-01-10 [1] CRAN (R 4.3.1)
#>  remotes             2.5.0   2024-03-17 [1] CRAN (R 4.3.1)
#>  reprex              2.1.1   2024-07-06 [1] CRAN (R 4.3.3)
#>  rJava               1.0-11  2024-01-26 [1] CRAN (R 4.3.1)
#>  rlang               1.1.4   2024-06-04 [1] CRAN (R 4.3.3)
#>  rmarkdown           2.28    2024-08-17 [1] CRAN (R 4.3.3)
#>  RSQLite             2.3.7   2024-05-27 [1] CRAN (R 4.3.3)
#>  rstudioapi          0.16.0  2024-03-24 [1] CRAN (R 4.3.1)
#>  sessioninfo         1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  SqlRender           1.18.1  2024-08-21 [1] CRAN (R 4.3.3)
#>  tibble              3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyselect          1.2.1   2024-03-11 [1] CRAN (R 4.3.1)
#>  tzdb                0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
#>  utf8                1.2.4   2023-10-22 [1] CRAN (R 4.3.1)
#>  vctrs               0.6.5   2023-12-01 [1] CRAN (R 4.3.1)
#>  vroom               1.6.5   2023-12-05 [1] CRAN (R 4.3.1)
#>  withr               3.0.1   2024-07-31 [1] CRAN (R 4.3.3)
#>  xfun                0.47    2024-08-17 [1] CRAN (R 4.3.3)
#>  yaml                2.3.10  2024-07-26 [1] CRAN (R 4.3.3)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

And here is the reprex using the issue65 branch.

remotes::install_github("ohdsi/Eunomia", ref = "issue65")
#> Using github PAT from envvar GITHUB_PAT. Use `gitcreds::gitcreds_set()` and unset GITHUB_PAT in .Renviron (or elsewhere) if you want to use the more secure git credential store instead.
#> Skipping install of 'Eunomia' from a github remote, the SHA1 (29dc6dd6) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(Eunomia)
library(DatabaseConnector)
cd <- createConnectionDetails("sqlite", server = getDatabaseFile("GiBleed", overwrite = T))
con <- connect(cd)
#> attempting to download GiBleed
#> attempting to extract and load: /Users/ablack/eunomia_data/GiBleed_5.3.zip to: /Users/ablack/eunomia_data/GiBleed_5.3.sqlite
#> Connecting using SQLite driver
#> attempting to download GiBleed
#> 
#> attempting to extract and load: /Users/ablack/eunomia_data/GiBleed_5.3.zip to: /Users/ablack/eunomia_data/GiBleed_5.3.sqlite
querySql(con, "select is_hierarchical from main.relationship") |> dplyr::tibble()
#> # A tibble: 480 × 1
#>    IS_HIERARCHICAL
#>    <chr>          
#>  1 0              
#>  2 0              
#>  3 0              
#>  4 0              
#>  5 0              
#>  6 0              
#>  7 0              
#>  8 0              
#>  9 0              
#> 10 0              
#> # ℹ 470 more rows
disconnect(con)

Created on 2024-09-18 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.3 (2024-02-29)
#>  os       macOS Sonoma 14.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2024-09-18
#>  pandoc   3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package           * version date (UTC) lib source
#>  backports           1.5.0   2024-05-23 [1] CRAN (R 4.3.3)
#>  bit                 4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
#>  bit64               4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
#>  blob                1.2.4   2023-03-17 [1] CRAN (R 4.3.0)
#>  cachem              1.1.0   2024-05-16 [1] CRAN (R 4.3.3)
#>  checkmate           2.3.2   2024-07-29 [1] CRAN (R 4.3.3)
#>  cli                 3.6.3   2024-06-21 [1] CRAN (R 4.3.3)
#>  CommonDataModel     0.2.0   2024-02-07 [1] CRAN (R 4.3.1)
#>  crayon              1.5.3   2024-06-20 [1] CRAN (R 4.3.3)
#>  curl                5.2.2   2024-08-26 [1] CRAN (R 4.3.3)
#>  DatabaseConnector * 6.3.2   2023-12-11 [1] CRAN (R 4.3.1)
#>  DBI                 1.2.3   2024-06-02 [1] CRAN (R 4.3.3)
#>  digest              0.6.37  2024-08-19 [1] CRAN (R 4.3.3)
#>  dplyr               1.1.4   2023-11-17 [1] CRAN (R 4.3.1)
#>  Eunomia           * 2.0.0   2024-09-18 [1] Github (ohdsi/Eunomia@29dc6dd)
#>  evaluate            0.24.0  2024-06-10 [1] CRAN (R 4.3.3)
#>  fansi               1.0.6   2023-12-08 [1] CRAN (R 4.3.1)
#>  fastmap             1.2.0   2024-05-15 [1] CRAN (R 4.3.3)
#>  fs                  1.6.4   2024-04-25 [1] CRAN (R 4.3.1)
#>  generics            0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
#>  glue                1.7.0   2024-01-09 [1] CRAN (R 4.3.1)
#>  hms                 1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
#>  htmltools           0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
#>  knitr               1.48    2024-07-07 [1] CRAN (R 4.3.3)
#>  lifecycle           1.0.4   2023-11-07 [1] CRAN (R 4.3.1)
#>  magrittr            2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
#>  memoise             2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
#>  pillar              1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig           2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
#>  R6                  2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
#>  readr               2.1.5   2024-01-10 [1] CRAN (R 4.3.1)
#>  remotes             2.5.0   2024-03-17 [1] CRAN (R 4.3.1)
#>  reprex              2.1.1   2024-07-06 [1] CRAN (R 4.3.3)
#>  rJava               1.0-11  2024-01-26 [1] CRAN (R 4.3.1)
#>  rlang               1.1.4   2024-06-04 [1] CRAN (R 4.3.3)
#>  rmarkdown           2.28    2024-08-17 [1] CRAN (R 4.3.3)
#>  RSQLite             2.3.7   2024-05-27 [1] CRAN (R 4.3.3)
#>  rstudioapi          0.16.0  2024-03-24 [1] CRAN (R 4.3.1)
#>  sessioninfo         1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  SqlRender           1.18.1  2024-08-21 [1] CRAN (R 4.3.3)
#>  tibble              3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyselect          1.2.1   2024-03-11 [1] CRAN (R 4.3.1)
#>  tzdb                0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
#>  utf8                1.2.4   2023-10-22 [1] CRAN (R 4.3.1)
#>  vctrs               0.6.5   2023-12-01 [1] CRAN (R 4.3.1)
#>  vroom               1.6.5   2023-12-05 [1] CRAN (R 4.3.1)
#>  withr               3.0.1   2024-07-31 [1] CRAN (R 4.3.3)
#>  xfun                0.47    2024-08-17 [1] CRAN (R 4.3.3)
#>  yaml                2.3.10  2024-07-26 [1] CRAN (R 4.3.3)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

The change I made is to be explicit about all column types when we read them into R from csv. However I set the types based on column order in the csv file which doesn't always match the cdm specification so I'd like some advice on how to handle that situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants