Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect query is parsed without any errors #261

Open
VaibhaveS opened this issue Sep 23, 2024 · 3 comments
Open

Incorrect query is parsed without any errors #261

VaibhaveS opened this issue Sep 23, 2024 · 3 comments

Comments

@VaibhaveS
Copy link

VaibhaveS commented Sep 23, 2024

Query

The query is incorrect, because there is no = between WHERE customer_id $1 however pg_query_free_plpgsql_parse_result(query) returns correctly.

#include <pg_query.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
  PgQueryPlpgsqlParseResult result;

  result = pg_query_parse_plpgsql(" \
CREATE OR REPLACE FUNCTION update_customer_total_sales(customer_id INT) \
RETURNS VOID AS $$ \
DECLARE \
    total NUMERIC; \
BEGIN \
    SELECT COALESCE(SUM(amount), 0) INTO total \
    FROM orders \
    WHERE customer_id $1; \
    UPDATE customers \
    SET total_sales = total \
    WHERE customer_id; \
    RAISE NOTICE 'Total sales for customer ID %: %', customer_id, total; \
END; \
$$ LANGUAGE plpgsql;");

  if (result.error) {
    printf("error: %s at %d\n", result.error->message, result.error->cursorpos);
  } else {
    printf("%s\n", result.plpgsql_funcs);
  }

  pg_query_free_plpgsql_parse_result(result);

  // Optional, this ensures all memory is freed upon program exit (useful when running Valgrind)
  pg_query_exit();

  return 0;
}
@lfittl
Copy link
Member

lfittl commented Sep 24, 2024

Interesting example!

So the reason this differs from what Postgres itself regards as valid syntax is that we have a parser patch in libpg_query that adds support for this:

https://github.com/pganalyze/libpg_query/blob/16-latest/patches/01_parser_additional_param_ref_support.patch#L147

The intent is to deal with normalized queries in pg_stat_statements that result from type casts, like SELECT float '1.5' being turned into SELECT float $1 (or the more common interval '1 day' syntax, where we've seen this be necessary in practice). Despite this not being valid in Postgres, its something we see in practice when working with normalized queries, and so the libpg_query parser needs to support it.

It may be possible to address this in libpg_query by either:

  1. Making the additional parameter reference support optional through a parser option
  2. Restricting the allowed type names to those in core Postgres (in your example "customer_id" is considered a type name)

Given that Postgres is extensible with custom types, I'd be inclined to focus on (1) as the right way to resolve this.

To help assess what makes sense here, can you share more on your use case, where this becomes a problem?

@VaibhaveS
Copy link
Author

Thanks @lfittl. Regarding my use-case, I am generating PostgreSQL queries with a LLM. I want to ensure its syntactically correct.

@polobo
Copy link

polobo commented Sep 24, 2024

I feel like you shouldn’t ever find such replacements within a text query literal in a DDL query. Such modifications only happen to DML queries. Can you condition the acceptance of the hack on the state of the parse?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants