-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translation of impure functions can result in wrong queries #33791
Comments
See a somewhat similar problem with SQL Server's COALESCE, in #32519. Yeah, we traditionally haven't given enough thought when duplicating SQL expressions in translation; in addition to the problem of impure functions, expression duplication can create serious performance issues as e.g. a heavy, complex subquery may end up getting evaluated more than once. In any case, EF doesn't currently know which functions are pure and which aren't; this is something we could add to SqlFunctionExpression, and then possibly fail translation if it requires duplication. Of course, if we can avoid duplicating altogether that's ideal, but there are definitely cases (such as the null semantics duplication above) where it's difficult to imagine things otherwise (we may be able to do something with CTEs once we support them). Any other thoughts on this @ranma42? |
AFACT the translation for the example query should be: SELECT instr(CASE
WHEN abs(random() / 9.2233720368547799E+18) > 0.5 THEN NULL
ELSE "b"."Url"
END, '5') > 0
FROM "Blogs" AS "b" I should check, but I believe that #33814 (function nullability) + #33757 (comparison nullability) already takes case of this specific case.
I will try to investigate the instances where the duplication can be avoided; I believe it is actually needed in very few cases. |
If we can remove duplication, that would absolutely be very welcome - though I very much suspect that some cases will still require it, barring more fancy translations involving CTE etc. In any case, your work and valuable thoughts on this are very much appreciated! |
I just found out that apparently the issue is not just with duplicating, but also with checking whether they are identical: var urls = db.Blogs
.Select(x => EF.Functions.Random() == EF.Functions.Random())
.ToList(); is translated to SELECT 1
FROM "Blogs" AS "b" while var urls = db.Blogs
.Select(x => EF.Functions.Random() == 0 + EF.Functions.Random())
.ToList(); is translated to SELECT abs(random() / 9.2233720368547799E+18) = 0.0 + abs(random() / 9.2233720368547799E+18)
FROM "Blogs" AS "b" If the assumption is that each invocation of |
@ranma42 yes, this is true. The general issue is that EF has no knowledge of which functions (or more generally, nodes) are pure/stable; and this sort of optimization can be very valuable in the general case, while functions like Random() are quite rare. So although it's definitely possible to produce contrived cases where this results Just to have fun, I'd argue that EF's optimization here isn't technically incorrect - two invocations of Random() may happen to yield the same result, it's just quite improbable for that to happen ;) And one final related thought... PostgreSQL has three categories of "get the current timestamp" functions: those that return the timestamp at transaction start, at statement start and at function invocation; the logic (as explained in the docs) is that it's very useful to have a concept of the "transaction time", as if the transaction occurred in a single instant (e.g. so that multiple modifications within the same transaction bear the same time stamp.). For those functions (as well as the statement start ones), optimizations such equality elimination are valid, whereas for the invocation-time ones they aren't. But again, specifically for this problem, although we probably have incorrect behavior here, it seems quite edge-casey/contrived. |
That's true, just like multiple invocations of
AFAICT PostgreSQL provides quite a few ways to play around with impure functions:
(
This is definitely a corner case in general and even more so in the context of EFCore. From all of the examples I listed, I believe that only the random generation functions are somewhat relevant, but not really something to focus an effort on, even if the current translation might cause incorrect results when dealing with them. |
This is a special case of |
The translation pipeline assumes that SQL expressions are pure, which is not true in general (for example
EF.Functions.Random()
).Under this assumption, it sometimes duplicates sub-expressions which can lead to inconsistent results, exceptions and (a minor issue, but in some cases still relevant) degraded performance.
See #32519 for a related issue that is specific to the translation performed by the SQL Server provider.
An example program that showcases the bug is:
Exception
The program terminates with the following exception:
because it is running the query
The two
random()
calls are evaluated independently, hence it is possible for this query to returnNULL
values (which IIUC the shaper/materializer rightly does not expect).Include provider and version information
EF Core version: 8.0.5
Database provider: Microsoft.EntityFrameworkCore.Sqlite
Target framework: .NET 8.0
Operating system: Linux (/WSL)
IDE: Visual Studio Code 1.89.1
The text was updated successfully, but these errors were encountered: