Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Placeholder syntax in parametrized queries #1397

Open
nbenn opened this issue Dec 22, 2023 · 12 comments
Open

Placeholder syntax in parametrized queries #1397

nbenn opened this issue Dec 22, 2023 · 12 comments

Comments

@nbenn
Copy link
Collaborator

nbenn commented Dec 22, 2023

Currently there is no way to determine what types of placeholders a backend supports for parametrized queries.

SQLite for example is quite flexible, supporting all of "?", "$i", "$name", ":name" (where "i" represents a parameter position and "name" a parameter name), while Postgres (I believe) only supports "?" and "$i" and cannot handle named parameters.

In what way this information could be communicated and whether it is worthwhile to do so is currently unclear to me. I'm hoping that this could be a place to have such a discussion.

Maybe I can start out explaining a bit why this became relevant tom me and then we see where this goes. It might be that we can sort out my specific issues without introducing such infrastructure at all. Of course this might be interesting to others nonetheless.

For context: I'm writing a high level wrapper in R that aligns the DBI interface with functionality exposed by adbcdrivermanager. As example, consider the following query:

library(adbcdrivermanager)

db <- adbc_database_init(adbcsqlite::adbcsqlite(), uri = ":memory:")
con <- adbc_connection_init(db)

write_adbc(datasets::swiss, con, "swiss", temporary = TRUE)

stmt <- adbc_statement_init(con)
adbc_statement_set_sql_query(
  stmt,
  "SELECT * from swiss WHERE Agriculture < $agr AND Education > $edu"
)
adbc_statement_prepare(stmt)

We have a "parameter schema" with parameter names

names(adbc_statement_get_parameter_schema(stmt)$children)
#> [1] "$agr" "$edu"

At the same time, I want my users to be able to pass parameter values by referring to more natural names such as "agr" and "edu". Something like

execute_query <- function(...) {

  params <- data.frame(...)
  adbc_statement_bind_stream(stmt, params)

  res <- nanoarrow::nanoarrow_allocate_array_stream()
  adbc_statement_execute_query(stmt, res)

  as.data.frame(res)
}

nrow(execute_query(agr = 30, edu = 10))
#> [1] 9

As things stand, it seems that for named parameters, names are not considered and only positions are relevant. This makes the current implementation of parameter binding (at least for SQLite) quite lenient.

nrow(execute_query(agr = 30, typo = 10))
#> [1] 9
nrow(execute_query(edu = 10, agr = 30))
#> [1] 1
nrow(execute_query(agr = 30, edu = "typo"))
#> [1] 0

If I want to properly support named parameters, I have to sort out ordering on my end, as well as potentially raising errors for name/type mismatches. To do this, I need to match names in the parameter schema with my "natural" names. And this is where having information on what kinds of placeholder patterns are supported by the back-end becomes relevant to me.

Cc @krlmlr @paleolimbot

@paleolimbot
Copy link
Member

Rather than try to specify some regex or "bind pattern", I wonder if we just want to push the "bind by name" operation into the driver. For example:

adbc_statement_bind_by_name <- function(statement, stream) {
  UseMethod("adbc_statement_bind_by_name")
}

adbc_statement_bind_by_name.default <- function(statement, stream) {
  stop("bind by name not supported by this driver")  # maybe throw a classed error that you can catch
}

The implementation for SQLite, for example, could use the parameter schema method (probably stealing the implementation from RSQLite).

@krlmlr
Copy link
Collaborator

krlmlr commented Dec 22, 2023

RPostgres tests only positional:

https://github.com/r-dbi/RPostgres/blob/c17e5bf728ccb4bb07084d0f882972429e8b0ccc/tests/testthat/helper-DBItest.R#L23

The challenge for DBItest is that it must generate the queries to test against the driver. How would the new generic help?

@paleolimbot
Copy link
Member

If it's DBI test that needs this, maybe what we need is:

adbc_driver_dbitest_tweaks <- function(driver) {
  UseMethod("adbc_driver_dbitest_tweaks")
}

adbc_driver_dbitest_tweaks.default <- function(driver) {
  stop("dbitest tweaks not supported by this driver")
}

...since there are likely other driver-specific options that would be better set by the driver package.

@krlmlr
Copy link
Collaborator

krlmlr commented Dec 22, 2023

Do you really want tight coupling between ADBC drivers and DBItest?

@paleolimbot
Copy link
Member

It could be specified somewhere else, too (e.g., an adbctest package that is GitHub only), but I don't mind putting DBItest in Suggests and adding the generic. I would eventually like to run DBItest in ADBC's CI (maybe not CRAN's, though)...it makes sense to me that the test configuration lives somewhere in the ADBC repo that is as close to the individual driver as possible.

@nbenn
Copy link
Collaborator Author

nbenn commented Dec 22, 2023

As things stand currently, DBItest does not need this. Setting up the DBItest context has to be done per back end anyways and it is there, where the placeholders can be defined explicitly.

It is some intermediary such as adbi, that needs this, as it has to be agnostic of the backend specifics yet might still want to add some smarts such as making sure parameters are passed in the right order (as long as this remains necessary) to isolate users from lower-level specifics.

@paleolimbot
Copy link
Member

I agree that it would be very confusing if a query containing $bread, $cheese were passed data.frame(cheese = 1, bread = 2) and the parameters were bound by position. It sounds like the idiomatic way to handle this in DBI is to bind by name if possible.

Testing aside, a simpler method would also be to add a bind_by_position = NA argument to the relevant bind method and warn if NA and error if FALSE (i.e., make somebody write DBIStatmentBind(stmt, some_data_frame, bind_by_position = TRUE) if they don't want a warning.

@nbenn
Copy link
Collaborator Author

nbenn commented Dec 22, 2023

@paleolimbot from where I stand, what you proposed with adbc_statement_bind_by_name() would make many of my problems go away here.

Trying to solve my problem specifically, we might also re-consider what names the parameter schema should have exactly. Does it make sense that parameter names come with "placeholder indicators"?

If the schema were to change, the only extra information I'd need is whether named parameter are supported by the back-end or not.

@lidavidm
Copy link
Member

For the format/driver side I filed #1398

@paleolimbot
Copy link
Member

I think my preference for this specific issue would be:

  • Implement a adbc_statement_bind_by_name() generic + helper in adbcdrivermanager with implementation in adbcsqlite and duckdb.
  • Implement dbitest in the driver package's tests so that they run on our CI. This would mean each driver package can choose the context/tweaks it needs, which it sounds like would include the ability to not generate queries with named parameters.

Alternatively, you could continue to hard-code the string you need in whatever format you need it until the C driver API can provide that information. You would only need it for SQLite, Postgres, Snowflake, and DuckDB.

@krlmlr
Copy link
Collaborator

krlmlr commented Dec 22, 2023

I like the idea of moving the DBItest tests to the respective drivers. @paleolimbot: what kind of support do you need?

@paleolimbot
Copy link
Member

See #1401 . I still get quite a few failures but I imagine I'm not setting it up quite right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants