Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insertTable for Sqlite adds decimal precision to data inserted into character field #280

Closed
anthonysena opened this issue Jul 26, 2024 · 9 comments

Comments

@anthonysena
Copy link
Contributor

I'm not sure if this is an RMM issue or not but starting by putting it here. This issue is based on the work happening in OHDSI/Strategus#147 (comment).

Here is some code that can be used to reproduce the example between creating the table & uploading data in SQLite vs PostgreSQL. The files references in the code below is attached to the issue at the bottom. The issue is that when database_id is inserted into SQLite, it is formatted as 85642205.0 while when it is inserted into PostgreSQL, it is 85642205

# Create empty SQLite database -------------------------
library(RSQLite)
if (file.exists("D:/TEMP/temp.sqlite")) {
  unlink("D:/TEMP/temp.sqlite")
}
mydb <- dbConnect(RSQLite::SQLite(), "D:/TEMP/temp.sqlite")
dbDisconnect(mydb)

resultsConnectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "sqlite",
  server = "D:/TEMP/temp.sqlite"
)

# Create the table
connection <- DatabaseConnector::connect(resultsConnectionDetails)

# Create the SQL from the resultsDataModelSpecification.csv
sql <- ResultModelManager::generateSqlSchema(
  csvFilepath = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/resultsDataModelSpecification.csv"
)
[database_meta_data.zip](https://github.com/user-attachments/files/16395694/database_meta_data.zip)
[database_meta_data.zip](https://github.com/user-attachments/files/16395708/database_meta_data.zip)

sql <- SqlRender::render(
  sql = sql,
  database_schema = "main"
)
DatabaseConnector::executeSql(connection = connection, sql = sql)

# NOTE: database_id is character
specification <- CohortGenerator::readCsv(file = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/resultsDataModelSpecification.csv")
ResultModelManager::uploadResults(
  connection = connection,
  schema = "main",
  resultsFolder = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData",
  purgeSiteDataBeforeUploading = TRUE,
  databaseIdentifierFile = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/database_meta_data.csv",
  specifications = specification
)

#NOTE: database_id = 85642205.0
DatabaseConnector::disconnect(connection)

# Do the same but on PG
resultsConnectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "postgresql",
  connectionString = keyring::key_get("resultsConnectionString", keyring = "ohda"),
  user = keyring::key_get("resultsAdmin", keyring = "ohda"),
  password = keyring::key_get("resultsAdminPassword", keyring = "ohda")
)

# Create the table
connection <- DatabaseConnector::connect(resultsConnectionDetails)

# Create the SQL from the resultsDataModelSpecification.csv
sql <- ResultModelManager::generateSqlSchema(
  csvFilepath = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/resultsDataModelSpecification.csv"
)
sql <- SqlRender::render(
  sql = sql,
  database_schema = "sena_test"
)
DatabaseConnector::executeSql(connection = connection, sql = sql)

# NOTE: database_id is character
specification <- CohortGenerator::readCsv(file = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/resultsDataModelSpecification.csv")
ResultModelManager::uploadResults(
  connection = connection,
  schema = "sena_test",
  resultsFolder = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData",
  purgeSiteDataBeforeUploading = TRUE,
  databaseIdentifierFile = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/database_meta_data.csv",
  specifications = specification
)

#NOTE: database_id = 85642205
DatabaseConnector::disconnect(connection)

database_meta_data.zip

@azimov
Copy link
Contributor

azimov commented Jul 29, 2024

By default runCheckAndFixCommands is not executed for uploads. Setting this to TRUE will modify the data to being the expected type prior to insert (so it will be consistent).

Beyond this, the insert is completed with DatabaseConnector::insertTable and the particular dbms has no impact on how the data is read into memory so I suspect the bug is in the use of DBI vs jdbc drivers.

@azimov
Copy link
Contributor

azimov commented Jul 29, 2024

Tracing through live code in the example you sent and adding a debug(DatabaseConnector:::insertTable.DatabaseConnectorDbiConnection)

I confirm the issue is with the DBI::dbWriteTable

https://github.com/OHDSI/DatabaseConnector/blob/bee58a314996123b59e08bd489bf712e14391c4f/R/InsertTable.R#L463

Which is adding the decimal precision internally - likely inside RSQLite https://github.com/r-dbi/DBI/blob/main/R/dbWriteTable_DBIConnection_Id_ANY.R

However, reproducing this example is trickier than I'd like.

Creating a reproducible example is challenging but the only way to resolve this in this package would be to force the type conversions after reading the csv. However, this feels like a hack.

@azimov
Copy link
Contributor

azimov commented Jul 29, 2024

The following is a reproducible example that uses only DatabaseConnector so I will move the issue there:

library(DatabaseConnector)

unlink("test.sqlite")

sqliteConnectionDetails <- createConnectionDetails(
  dbms = "sqlite",
  server = "test.sqlite"
)

sqliteConnection <- connect(sqliteConnectionDetails)


sql <- "
CREATE TABLE @schema.test_table (
  database_id VARCHAR primary key
);
"

renderTranslateExecuteSql(sqliteConnection, sql, schema = "main")


data <- data.frame(
  database_id = 12345689878
)


insertTable(connection = sqliteConnection,
            data = data,
            createTable = FALSE,
            dropTableIfExists = FALSE,
            
            tableName = "test_table", 
            databaseSchema = "main")

print(renderTranslateQuerySql(sqliteConnection, "SELECT database_id FROM test_table"))

# Prints
#  DATABASE_ID
# 12345689878.0

disconnect(sqliteConnection)

@azimov azimov transferred this issue from OHDSI/ResultModelManager Jul 29, 2024
@azimov azimov changed the title Data formatting difference between DBs insertTable for Sqlite adds decimal precision to data inserted into character field Jul 29, 2024
@schuemie
Copy link
Member

schuemie commented Aug 5, 2024

Would it be possible to convert the numeric value to a character string in R, before sending it to insertTable()? That way we're not dependent on how each database driver handles this.

@azimov
Copy link
Contributor

azimov commented Aug 14, 2024

Would it be possible to convert the numeric value to a character string in R, before sending it to insertTable()? That way we're not dependent on how each database driver handles this.

This will have a workaround in the next ResultModelManager release that fixes the types. However, the inconsistency between JDBC and DBI/Sqlite drivers could create issues in the future given that this will not only impact result sets.

Perhaps the data types of the table could be read prior to insert and then this could be cast in R accordingly?

@azimov
Copy link
Contributor

azimov commented Aug 28, 2024

This appears to be creating problems downstream when sqlite is used see OHDSI/Eunomia#65

@schuemie
Copy link
Member

I really think consistently casting numerics to string is out of scope for DatabaseConnector. This should be handled prior to handing it over to DatabaseConnector.

If not, then we'd need to parse every SQL statement to see if it contains an INSERT, then try to derive where it is inserting to, get the schema of that, and perform the correct casting.

@azimov
Copy link
Contributor

azimov commented Sep 6, 2024

I would argue that the conversion of data to use decimal precision is occurring after the user passes the data to insertTable. In every other driver case, the insert does not add the decimal value. Perhaps a more consistent approach would be to use the jdbc driver?

@schuemie
Copy link
Member

schuemie commented Sep 9, 2024

Sorry, this is incredibly easy to fix (and not unreasonable to ask) before talking to DatabaseConnector, and really hard to fix with DatabaseConnector. I'm going to declare this a 'won't fix'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants