Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eunomia 2.0 Unit Test Notes #42

Open
anthonysena opened this issue Apr 7, 2023 · 11 comments
Open

Eunomia 2.0 Unit Test Notes #42

anthonysena opened this issue Apr 7, 2023 · 11 comments
Milestone

Comments

@anthonysena
Copy link
Collaborator

anthonysena commented Apr 7, 2023

Just making an issue here to note any changes made while attempting to update unit tests to use Eunomia 2.0 (currently on the develop branch).

  1. Unit tests will need to set up the EUNOMIA_DATA_FOLDER environment variable so GitHub Actions (GHA) has a place to store the data. Here is some boiler plate code that could go into the tests/testthat/setup.R file for reference:
dataFolder <- tempfile()
dir.create(dataFolder)
oldEunomiaFolder <- Sys.getenv("EUNOMIA_DATA_FOLDER")
Sys.setenv("EUNOMIA_DATA_FOLDER" = dataFolder)
withr::defer(
  {
    unlink(dataFolder)
    Sys.setenv("EUNOMIA_DATA_FOLDER" = oldEunomiaFolder)
  },
  testthat::teardown_env()
)
  1. Vignettes that use Eunomia will also need to set the EUNOMIA_DATA_FOLDER environment variable in order for the R CMD check to work properly on GHA.

  2. The signature of the function Eunomia::getConnectionDetails() has changed. v1.x supported the following signature:

getEunomiaConnectionDetails <- function(databaseFile = tempfile(fileext = ".sqlite"))

v2.x removes the databaseFile parameter when calling Eunomia::getConnectionDetails(). The control over the databaseFile parameter is governed by the EUNOMIA_DATA_FOLDER environment variable. Alternatively, you can make use of the Eunomia::getConnectionDetails function for more fine-grained control over the dataset used, etc.

@anthonysena
Copy link
Collaborator Author

Relevant changes are on this branch for now: https://github.com/OHDSI/CohortGenerator/tree/test-eunomia-2

@fdefalco fdefalco added this to the v2.0.0 milestone May 23, 2023
@anthonysena
Copy link
Collaborator Author

An update here - I've bumped up the reference to Eunomia on the test-eunomia-2 branch and the unit tests have passed again so I think we're at a point now where we could consider a release. Tagging @schuemie for awareness as we'll need to coordinate the release of Eunomia so that we do not break the other HADES packages that use it for unit testing.

@schuemie
Copy link
Member

schuemie commented Jun 9, 2023

Did you run check on all reverse dependencies? Here's the code I use to do that for major DatabaseConnector changes: https://github.com/OHDSI/DatabaseConnector/blob/main/extras/PackageMaintenance.R#L119

@anthonysena
Copy link
Collaborator Author

@schuemie - I have not run this reverse dependency check. I must admit I'm unsure what this is doing and how it is useful for testing this type of change? In my mind, I was envisioning that we'd need to set up a pull request per HADES repo that updates the unit tests to make the changes for Eunomia that I described at the start of this issue. I'm curious how the reverse dependency check can help us in this case?

@ablack3
Copy link
Collaborator

ablack3 commented Jun 21, 2023

4. The signature of the function Eunomia::getConnectionDetails() has changed. v1.x supported the following signature:

getEunomiaConnectionDetails <- function(databaseFile = tempfile(fileext = ".sqlite"))

v2.x removes the databaseFile parameter when calling Eunomia::getConnectionDetails(). The control over the databaseFile parameter is governed by the EUNOMIA_DATA_FOLDER environment variable. Alternatively, you can make use of the Eunomia::getConnectionDetails function for more fine-grained control over the dataset used, etc.

Why don't we keep the databaseFile argument of getEunomiaConnectionDetails so we don't break this function? EUNOMIA_DATA_FOLDER specifies where the package data is stored (i.e. the zip file that used to be in the inst/ folder). databaseFile specifies where the new copy of the eunomia dataset will be extracted to. They are different paths and serve different purposes.

If there was a way to provide a default location for the zip file when EUNOMIA_DATA_FOLDER is not set and automatically download the eunomia data if it is not already downloaded then I think v2 could be non-breaking.

@anthonysena
Copy link
Collaborator Author

  1. The signature of the function Eunomia::getConnectionDetails() has changed. v1.x supported the following signature:
getEunomiaConnectionDetails <- function(databaseFile = tempfile(fileext = ".sqlite"))

v2.x removes the databaseFile parameter when calling Eunomia::getConnectionDetails(). The control over the databaseFile parameter is governed by the EUNOMIA_DATA_FOLDER environment variable. Alternatively, you can make use of the Eunomia::getConnectionDetails function for more fine-grained control over the dataset used, etc.

Why don't we keep the databaseFile argument of getEunomiaConnectionDetails so we don't break this function? EUNOMIA_DATA_FOLDER specifies where the package data is stored (i.e. the zip file that used to be in the inst/ folder). databaseFile specifies where the new copy of the eunomia dataset will be extracted to. They are different paths and serve different purposes.

If there was a way to provide a default location for the zip file when EUNOMIA_DATA_FOLDER is not set and automatically download the eunomia data if it is not already downloaded then I think v2 could be non-breaking.

@ablack3 - going through this issue with the hopes that we can make Eunomia 2.x backwards compatible. Upon review, I had a few thoughts:

  • We'd want to have the databaseFile and dbms parameters added to the getEunomiaConnectionDetails function to make it backwards compatible as you suggested
  • In the case that Sys.getenv("EUNOMIA_DATA_FOLDER") is not specified, should we default this to a temp folder? I think this would then allow for backwards compatibility and allow for those packages that want to set the Sys.getenv("EUNOMIA_DATA_FOLDER") explicitly to save time to avoid downloading the same data each time. We could add a warning to inform the user that we're using a temp location vs. stopping the process from downloading the Eunomia data.

Tagging @ginberg @fdefalco for input and awareness

@ginberg
Copy link
Collaborator

ginberg commented Jul 21, 2023

@anthonysena his suggestion seems like a good idea to me. I have added a PR to show how does can look like. What do you think about this @ablack3 @fdefalco ?

@ginberg
Copy link
Collaborator

ginberg commented Jul 27, 2023

@ablack3 thanks for merging the PR.

I did the reverse dependency check like Martijn suggested and I found an issue. The first package that I tried, CohortMethod, has failing tests with Eunomia v2 but they do work with Eunomia v1. After digging into it, it seems to me that some dates are not correctly in the sqllite database. I made the reprex below, it fails with v2 but works with v1. Any idea what it could be?

#remotes::install_github("ohdsi/Eunomia") #v1
#remotes::install_github("ohdsi/Eunomia", ref = "develop") #v2
#sessionInfo()

library(Eunomia)
library(testthat)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:testthat':
#> 
#>     matches
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
connectionDetails <- getEunomiaConnectionDetails()
connection <- DatabaseConnector::connect(connectionDetails)
#> Connecting using SQLite driver

dataSetNameVersion <- "GiBleed_5.3"
eunomiaDataFolder  <- Sys.getenv("EUNOMIA_DATA_FOLDER")
drugEraFileName    <- "DRUG_ERA.csv"
drugEraId          <- 2707

# might be needed to manually unzip the dataset zip in your eunomia folder
csvData <- read.csv(file.path(eunomiaDataFolder, dataSetNameVersion, drugEraFileName)) %>%
  dplyr::filter(DRUG_ERA_ID == drugEraId)

dbData <- DatabaseConnector::renderTranslateQuerySql(connection, "SELECT * FROM main.drug_era WHERE DRUG_ERA_ID = @drugEraId;",
                                                     drugEraId = drugEraId)

testthat::expect_equal(as.character(csvData$DRUG_ERA_START_DATE), as.character(dbData$DRUG_ERA_START_DATE))
#> Error: as.character(csvData$DRUG_ERA_START_DATE) not equal to as.character(dbData$DRUG_ERA_START_DATE).
#> 1/1 mismatches
#> x[1]: "1984-09-19"
#> y[1]: "1970-01-01"
testthat::expect_equal(as.character(csvData$DRUG_ERA_END_DATE), as.character(dbData$DRUG_ERA_END_DATE))
#> Error: as.character(csvData$DRUG_ERA_END_DATE) not equal to as.character(dbData$DRUG_ERA_END_DATE).
#> 1/1 mismatches
#> x[1]: "1984-10-03"
#> y[1]: "1970-01-01" 

These are the reverse dependencies btw

> reverseDependencies$name
 [1] "CohortMethod"                   "SelfControlledCaseSeries"       "SelfControlledCohort"           "PatientLevelPrediction"        
 [5] "DeepPatientLevelPrediction"     "EnsemblePatientLevelPrediction" "Characterization"               "CohortGenerator"               
 [9] "CohortDiagnostics"              "PheValuator"                    "DataQualityDashboard"           "MethodEvaluation"              
[13] "FeatureExtraction"              "ShinyAppBuilder"

@ablack3
Copy link
Collaborator

ablack3 commented Jul 27, 2023

Hi @ginberg,

The issue is that loading dates into SQLite is not working as expected. V2 of Eunomia dynamically loads the data into a sqlite database when the user requests the data which is different than V1 which simply included a static sqlite file.

However when we load dates into SQLite we have to be careful because SQLite does not have a date type. Here is a screenshot showing the moment right before and right after the data is loaded from R to sqlite. As you can see the dates are correct in R but get turned into numbers in sqlite.

image

I've made a possible fix on a new branch call "sqlite_dates". Will you try with that branch?

@ginberg
Copy link
Collaborator

ginberg commented Jul 28, 2023

@ablack3 your fix looks good to me. Not ideal we have to do this for sqlite, but at least now the dates are the same as previously.

I did run the R check for all reverse dependencies and found no problems related to Eunomia using this version. So, it looks to me like we are backwards compatible with this PR.

@fdefalco can you have a look at the PR?

@fdefalco
Copy link
Collaborator

fdefalco commented Sep 7, 2023

I've been working my way back to Eunomia, we had issues in the CommonDataModel package that I'm working on resolving. I have a branch of Eunomia that resolves a number of other issues as my test chain before reverse dependency checking was to test Achilles and DataQualityDashboard which also both failed when using the new Eunomia.

Hope to get to this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants