Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why DefaultIfEmpty inside double SelectMany is wrong working? #33343

Open
KvaKoZyaBBrr opened this issue Mar 18, 2024 · 22 comments · May be fixed by #34110
Open

Why DefaultIfEmpty inside double SelectMany is wrong working? #33343

KvaKoZyaBBrr opened this issue Mar 18, 2024 · 22 comments · May be fixed by #34110

Comments

@KvaKoZyaBBrr
Copy link

I have entities:

class MyRecord<T>
{
    public T Item {get;set;}
}
class JoinResult<T1, T2>
{
    public T1 Left { get; set; }
    public T2 Right { get; set; }
}
class  TestItem
{
    public int EntityId {get;set;}
    public string Name {get;set;}
    public int ParentId {get;set;} // link to TestItem.EntityId
}
class  TestQuery
{
    public int QueryId {get;set;}
}

And when Im try get all childs for requested items ordered by desc which has linked query I use next linq:

DbSet<MyRecord<long>>().FromSql(select * from unnest({0}), object[] { NpgsqlParameter<MyRecord<long>[]> })
.SelectMany(pk => dataContext.TestItems
    .Where(item => pk.Item == item.ParentId)
    .SelectMany(
        collectionSelector: entity => dataContext.TestQueries
            .Where(a => a.QueryId == entity.EntityId)
            .DefaultIfEmpty(), 
        resultSelector: (left, right) => new JoinResult<TestItem, TestQuery>{ 
            Left = left, 
            Right = right 
        }
    )
    .OrderByDescending(result => result.Right.ObjectId)
    .Select(result => result.Left))

I expect DefaultIfEmpty change inner SelectMany to leftJoin inside join by outer SelectMany smth

SELECT t0.EntityId, t0.Name, t0.ParentId
FROM (
    select * from unnest(ARRAY[101,102])
) AS e
JOIN LATERAL (
    SELECT c.EntityId, c.Name, c.ParentId
    FROM items AS c
    LEFT JOIN LATERAL (
        SELECT s.QueryId
        FROM queries AS q
        WHERE q.QueryId = c.EntityId
    ) AS t ON TRUE
    WHERE e.unnest = c.ParentId
    ORDER BY t.QueryId DESC
) AS t0 ON TRUE

But I get

SELECT t0.EntityId, t0.Name, t0.ParentId
FROM (
    select * from unnest(ARRAY[101,102])
) AS e
LEFT JOIN LATERAL (
    SELECT c.EntityId, c.Name, c.ParentId
    FROM items AS c
    JOIN LATERAL (
        SELECT s.QueryId
        FROM queries AS q
        WHERE q.QueryId = c.EntityId
    ) AS t ON TRUE
    WHERE e.unnest = c.ParentId
    ORDER BY t.QueryId DESC
) AS t0 ON TRUE

How can I change this behaviour?

@roji
Copy link
Member

roji commented Mar 22, 2024

@KvaKoZyaBBrr can you please submit a runnable sample, as a console program I can actually run and see the results? Your query code there doesn't seem to compile.

@KvaKoZyaBBrr
Copy link
Author

KvaKoZyaBBrr commented Mar 22, 2024

Of course.
TestEF_selectMany.zip
Im work on postgres so assert write on postgreSQL. But in SQL I expect same behaviour in query

@roji
Copy link
Member

roji commented Mar 23, 2024

Thanks for the repro. I've removed the irrelevant parts of the query (that's always a good idea), here's a distlilled comparison of with and without DefaultIsEmpty for better understanding what's going on:

var withDefaultIfEmpty = context.TestEntities
    .Select(x => x.Id)
    .SelectMany(pk => context.TestEntities
        .Where(entity => pk == entity.Id)
        .SelectMany(
            entity => context.TestEntities
                .Where(a => a.Name.Contains(entity.Name))
                .DefaultIfEmpty()));
Console.WriteLine(withDefaultIfEmpty.ToQueryString());

var withoutDefaultIfEmpty = context.TestEntities
    .Select(x => x.Id)
    .SelectMany(pk => context.TestEntities
        .Where(entity => pk == entity.Id)
        .SelectMany(
            entity => context.TestEntities
                .Where(a => a.Name.Contains(entity.Name))));
Console.WriteLine(withoutDefaultIfEmpty.ToQueryString());

SQLs:

SELECT [s].[Id], [s].[Name], [s].[Parent]
FROM [TestEntities] AS [t]
LEFT JOIN (
    SELECT [t2].[Id], [t2].[Name], [t2].[Parent], [t0].[Id] AS [Id0]
    FROM [TestEntities] AS [t0]
    CROSS APPLY (
        SELECT [t1].[Id], [t1].[Name], [t1].[Parent]
        FROM [TestEntities] AS [t1]
        WHERE [t1].[Name] IS NOT NULL AND [t0].[Name] IS NOT NULL AND (CHARINDEX([t0].[Name], [t1].[Name]) > 0 OR [t0].[Name] LIKE N'')
    ) AS [t2]
) AS [s] ON [t].[Id] = [s].[Id0]

SELECT [s].[Id], [s].[Name], [s].[Parent]
FROM [TestEntities] AS [t]
INNER JOIN (
    SELECT [t2].[Id], [t2].[Name], [t2].[Parent], [t0].[Id] AS [Id0]
    FROM [TestEntities] AS [t0]
    CROSS APPLY (
        SELECT [t1].[Id], [t1].[Name], [t1].[Parent]
        FROM [TestEntities] AS [t1]
        WHERE [t1].[Name] IS NOT NULL AND [t0].[Name] IS NOT NULL AND (CHARINDEX([t0].[Name], [t1].[Name]) > 0 OR [t0].[Name] LIKE N'')
    ) AS [t2]
) AS [s] ON [t].[Id] = [s].[Id0]

First, regarding why the outer join is a LEFT JOIN... The fragment inside the outer SelectMany() ends with a DefaultIsEmpty(); this means that for every outer ID, if there's no matching TestEntity, the query should return a null. If the outer query used a non-LEFT INNER JOIN, it would instead filter out any IDs with non-matching TestEntities, which would return wrong result (remember, EF always tries to produce the same results that a regular LINQ-to-Object query would produce).

But more generally, your usage of SelectMany() is slightly odd, and likely inefficient... Taking your original fragmentary query above:

DbSet<MyRecord<long>>().FromSql(select * from unnest({0}), object[] { NpgsqlParameter<MyRecord<long>[]> })
.SelectMany(pk => dataContext.TestItems
    .Where(item => pk.Item == item.ParentId)

What you're looking for here seems to be a simple Contains query:

var ids = new[] { 1, 2, 3 };
_ = context.TestEntities.Where(e => ids.Contains(e.Id))...

This should make the EF provider produce better SQL, using the PG ANY operator, rather than using custom SQL with unnest and SelectMany(), which translates to costly extra joins.

@dombaigabor
Copy link

We have the same issue with nested SelectMany (MSSQL, EF Core 7.0.5.).

Thanks for the repro. I've removed the irrelevant parts of the query (that's always a good idea), here's a distlilled comparison of with and without DefaultIsEmpty for better understanding what's going on:

var withDefaultIfEmpty = context.TestEntities
    .Select(x => x.Id)
    .SelectMany(pk => context.TestEntities
        .Where(entity => pk == entity.Id)
        .SelectMany(
            entity => context.TestEntities
                .Where(a => a.Name.Contains(entity.Name))
                .DefaultIfEmpty()));
Console.WriteLine(withDefaultIfEmpty.ToQueryString());

var withoutDefaultIfEmpty = context.TestEntities
    .Select(x => x.Id)
    .SelectMany(pk => context.TestEntities
        .Where(entity => pk == entity.Id)
        .SelectMany(
            entity => context.TestEntities
                .Where(a => a.Name.Contains(entity.Name))));
Console.WriteLine(withoutDefaultIfEmpty.ToQueryString());

SQLs:

SELECT [s].[Id], [s].[Name], [s].[Parent]
FROM [TestEntities] AS [t]
LEFT JOIN (
    SELECT [t2].[Id], [t2].[Name], [t2].[Parent], [t0].[Id] AS [Id0]
    FROM [TestEntities] AS [t0]
    CROSS APPLY (
        SELECT [t1].[Id], [t1].[Name], [t1].[Parent]
        FROM [TestEntities] AS [t1]
        WHERE [t1].[Name] IS NOT NULL AND [t0].[Name] IS NOT NULL AND (CHARINDEX([t0].[Name], [t1].[Name]) > 0 OR [t0].[Name] LIKE N'')
    ) AS [t2]
) AS [s] ON [t].[Id] = [s].[Id0]

SELECT [s].[Id], [s].[Name], [s].[Parent]
FROM [TestEntities] AS [t]
INNER JOIN (
    SELECT [t2].[Id], [t2].[Name], [t2].[Parent], [t0].[Id] AS [Id0]
    FROM [TestEntities] AS [t0]
    CROSS APPLY (
        SELECT [t1].[Id], [t1].[Name], [t1].[Parent]
        FROM [TestEntities] AS [t1]
        WHERE [t1].[Name] IS NOT NULL AND [t0].[Name] IS NOT NULL AND (CHARINDEX([t0].[Name], [t1].[Name]) > 0 OR [t0].[Name] LIKE N'')
    ) AS [t2]
) AS [s] ON [t].[Id] = [s].[Id0]

First, regarding why the outer join is a LEFT JOIN... The fragment inside the outer SelectMany() ends with a DefaultIsEmpty(); this means that for every outer ID, if there's no matching TestEntity, the query should return a null. If the outer query used a non-LEFT INNER JOIN, it would instead filter out any IDs with non-matching TestEntities, which would return wrong result (remember, EF always tries to produce the same results that a regular LINQ-to-Object query would produce).

But more generally, your usage of SelectMany() is slightly odd, and likely inefficient... Taking your original fragmentary query above:

DbSet<MyRecord<long>>().FromSql(select * from unnest({0}), object[] { NpgsqlParameter<MyRecord<long>[]> })
.SelectMany(pk => dataContext.TestItems
    .Where(item => pk.Item == item.ParentId)

What you're looking for here seems to be a simple Contains query:

var ids = new[] { 1, 2, 3 };
_ = context.TestEntities.Where(e => ids.Contains(e.Id))...

This should make the EF provider produce better SQL, using the PG ANY operator, rather than using custom SQL with unnest and SelectMany(), which translates to costly extra joins.

Why is the inner select using cross apply in the first version?
How should we write nested select many, if we need LEFT JOIN for the first level, and LEFT JOIN for the second level?

@KvaKoZyaBBrr
Copy link
Author

KvaKoZyaBBrr commented Mar 28, 2024

TestEF_selectMany.zip
@roji my bad( I builded wrong repo. Fixed this.
In this examples more true issue.
By actual queries I get
image
but I need
image

@KvaKoZyaBBrr
Copy link
Author

@roji
Can you change to change our linq query to get required sql-query and expected records?

@roji
Copy link
Member

roji commented Apr 8, 2024

@KvaKoZyaBBrr I haven't had time to reexamine this. If your repo shows the actual problems this time, I'll try to find some time in the next few weeks and will report back.

@KvaKoZyaBBrr KvaKoZyaBBrr linked a pull request Jun 28, 2024 that will close this issue
6 tasks
@KvaKoZyaBBrr
Copy link
Author

@roji can I get feedback about this problem or about my PR for current issue please?

@roji
Copy link
Member

roji commented Jul 29, 2024

@KvaKoZyaBBrr I'm sorry I haven't been able to look at this - it's a very busy time and it may take a while longer unfortunately. You'll have to be patient a bit more.

@dombaigabor
Copy link

@KvaKoZyaBBrr I'm sorry I haven't been able to look at this - it's a very busy time and it may take a while longer unfortunately. You'll have to be patient a bit more.

@roji systems like ours, has a lot of data tables on a frontend, powered by SQL views which we would to rewrite into EF Core queries. Now, we have to change the original query syntax if there is a nested join (LEFT then INNER), becasue EF translates it wrong. The problem is the same if there is a LEFT first, than an INNER join.

Please check @KvaKoZyaBBrr's PR if you can.

@dombaigabor
Copy link

@roji is there any update about this problem? We are waiting for this fix. :(

@roji
Copy link
Member

roji commented Oct 7, 2024

I've looked at this again and I'm still not seeing a problem with the SQL that EF is producing - I suspect there are some incorrect expectations here.

@KvaKoZyaBBrr your 2nd code sample is essentially the same as the 1st. I stripped down your query a bit (please do that yourself when submitting code samples, removing anything non-essential):

var selectMany = context.TestEntities
    .Select(x => x.Id)
    .SelectMany(pk => context.TestEntities
        .Where(entity => pk == entity.Parent)
        .SelectMany(
            entity => context.TestQueries
                .Where(a => a.Query.Contains(entity.Name))
                .DefaultIfEmpty(),
            (left, right) => new JoinResult<TestEntity, TestQuery>() { Left = left, Right = right }
        ));

As I wrote above, the DefaultIfEmpty() means that a default/empty result needs to be returned for each outer TestEntity; that is why there's a LEFT JOIN and not an INNER JOIN - an INNER JOIN would filter out rows without matches.

I unfortunately can't start explaining how LINQ operators or why EF chooses a specific translation in each specific case. I really, really advise trying to run your LINQ queries in-memory - without EF; look at the results you get, and make sure you understand why those results are returned. If you then see EF returning different results, that would likely be a bug in EF, since EF strives to match the .NET in-memory LINQ evaluation. If you see that happening, please submit a code sample clearly showing the same LINQ query behaving differently - i.e. returning different results - in EF and with in-memory evaluation.

Also, I suggest trying to think about your queries in LINQ, rather than thinking about the SQL you want, and then trying to figure out how to get that exact SQL generated by EF. If you're comfortable with SQL and want to execute a very specific SQL query, there's nothing wrong with just writing that SQL rather than using LINQ. EF really shines when you think in LINQ and let translate to SQL.

I'll go ahead and close this issue for now, so I'm not seeing a problem - yet. If you can submit a LINQ query which behaves differently in EF as it does in-memory (as explained above), I'm happy to reopen and investigate that as a bug.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Oct 7, 2024
@roji roji added the closed-no-further-action The issue is closed and no further action is planned. label Oct 7, 2024
@KvaKoZyaBBrr
Copy link
Author

@roji ok let's do it again :)
I have that storage:
image
DefaultIfEmpty() inside SelectMany() together translate on SQL to LeftJoin. I expect that behavior. And that true.

var firstPart = context.TestEntities
    .SelectMany(x => context.TestQueries
        .Where(a => a.ObjectName.Contains(x.Name))
        .DefaultIfEmpty());

will be translated

SELECT t1."Id", t1."ObjectName"
FROM "TestEntities" AS t
LEFT JOIN LATERAL (
    SELECT t0."Id", t0."ObjectName"
    FROM "TestQueries" AS t0
    WHERE t0."ObjectName" IS NOT NULL AND t."Name" IS NOT NULL AND strpos(t0."ObjectName", t."Name") > 0
) AS t1 ON TRUE

And I get right results. I get some testQueries for any testEntity which contain entityName

image
image

Next. I wanna use that case for any child of any testEntity (I will use your version of code)

var selectMany = context.TestEntities
    .Select(x => x.Id)
    .SelectMany(pk => context.TestEntities
        .Where(entity => pk == entity.Parent)
        .SelectMany(
            entity => context.TestQueries
                .Where(a => a.ObjectName.Contains(entity.Name))
                .DefaultIfEmpty(),
            (left, right) => new JoinResult<TestEntity, TestQuery>() { Left = left, Right = right }
        ));

Ant this translated to

SELECT s."Id", s."Name", s."Parent", s."Id0", s."ObjectName"
FROM "TestEntities" AS t
LEFT JOIN (
    SELECT t0."Id", t0."Name", t0."Parent", t2."Id" AS "Id0", t2."ObjectName"
    FROM "TestEntities" AS t0
    JOIN LATERAL (
        SELECT t1."Id", t1."ObjectName"
        FROM "TestQueries" AS t1
        WHERE t1."ObjectName" IS NOT NULL AND t0."Name" IS NOT NULL AND strpos(t1."ObjectName", t0."Name") > 0
    ) AS t2 ON TRUE
) AS s ON t."Id" = s."Parent"

And I get

image

And more funny case if not create testQuery

image

But it is not what I expect.
I expected
for TestEntity with id=1: 2 childs: id=2 and id=3. For childwith id=2 get 2 query with same name. For child id=3 no query
So for TestEntity with id=1 - 3 rows: 2 filled query and 1 null row (because DefaultIfEmpty is inside inner selectMany!)
for TestEntity woth id=2: 1 child: id=4 witch have 2 query with same name
So for TestEntity with id=2 - 2 row: 2 filled query
for TestEntity with id=3 and id=4 no rows because in that level not contains DefaultIfEmpty
So I expected

image

because I just create query with selectMany under existed selectMany with defaultIfEmpty.
So how I can get interesting result? I try create sql raw query and I build this (in real I create same query and just formatting to EF style)

image

So I see what inner leftJoin disapear and transform to innerJoin - that is wrong because inner SelectMany must operate with null. Also I see what outer selectMany which haven`t defaultIfEmpty translated to leftJoin - thats wrong again because outer query does not expect null.
I check EF code and I found problem:
Tranlator visit SelectMany -> deep search DefaultIfEmpty inside arguments -> If DefaultIfEmpty is founded, SelectMany translate to LeftJoin and founded DefaultIfEmpty is disapear.
So I modify EF code in my PR to check DefaultIfEmpty on current level on SelectMany args. I think this is true issue.
Sorry what lot of text, but I work on this problem more 2 weeks and wait answers several months. I really wanna complete this issue :)

attached test project
ClearTest.zip

@roji
Copy link
Member

roji commented Oct 7, 2024

@KvaKoZyaBBrr have you tried running your LINQ queries against in-memory collections that contain the same data as in the database, as I suggested above?

@roji
Copy link
Member

roji commented Oct 7, 2024

It can really be quite hard to a maintainer like me to understand what exactly you're saying with all of the text and snapshots. It's really easiest if you post a simple console program with a LINQ query that runs once in-memory (no EF at all), and once against a database (with EF). If the results differ between these two executions, then you have a clear bug report with a repro.

I promise that if you post something like this, I'll reopen and we'll investigate.

@KvaKoZyaBBrr
Copy link
Author

@roji Wow! It`s really very simple to catch problem! Nice!
ClearTest.zip

var firstPart = context.TestEntities
    .SelectMany(x => context.TestQueries
        .Where(a => a.ObjectName.Contains(x.Name))
        .DefaultIfEmpty());
            
var firstPartInMemory = entitiesInMemory
    .SelectMany(x => queriesInMemory
        .Where(a => a.ObjectName!.Contains(x.Name!))
        .DefaultIfEmpty());

thats result are equal!

var selectMany = context.TestEntities
    .Select(x => x.Id)
    .SelectMany(pk => context.TestEntities
        .Where(entity => pk == entity.Parent)
        .SelectMany(
            entity => context.TestQueries
                .Where(a => a.ObjectName.Contains(entity.Name))
                .DefaultIfEmpty(),
            (left, right) => new JoinResult<TestEntity, TestQuery>() { Left = left, Right = right }
        ));

var selectManyInMemory = entitiesInMemory
    .Select(x => x.Id)
    .SelectMany(pk => entitiesInMemory
        .Where(entity => pk == entity.Parent)
        .SelectMany(
            entity => queriesInMemory
                .Where(a => a.ObjectName!.Contains(entity.Name!))
                .DefaultIfEmpty(),
            (left, right) => new JoinResult<TestEntity, TestQuery>() { Left = left, Right = right }
        ));

thats is not equal

image
vs
image

@KvaKoZyaBBrr
Copy link
Author

In-memory result is that I expect on EF

@roji
Copy link
Member

roji commented Oct 7, 2024

Confirmed, full minimal cleaned-up repro below. @KvaKoZyaBBrr thanks for doing the work here - in the future a console program like the below is the quickest way of reporting an issue.

var actual = await context.TestEntities
    .Select(x => x.Id)
    .SelectMany(pk => context.TestEntities
        .Where(entity => pk == entity.Parent)
        .SelectMany(
            entity => context.TestQueries
                .Where(a => a.ObjectName.Contains(entity.Name))
                .DefaultIfEmpty(),
            (left, right) => new { Left = left, Right = right }
        ))
    .ToListAsync();
Full minimal repro
await using var context = new BlogContext();
await context.Database.EnsureDeletedAsync();
await context.Database.EnsureCreatedAsync();

List<TestEntity> entitiesInMemory =
[
    new() { Id = 1, Name = "first" },
    new() { Id = 2, Name = "second", Parent = 1 },
    new() { Id = 3, Name = "third", Parent = 1 },
    new() { Id = 4, Name = "fourth", Parent = 2 }
];

context.TestEntities.AddRange(
    new() { Name = "first" },
    new() { Name = "second", Parent = 1 },
    new() { Name = "third", Parent = 1 },
    new() { Name = "fourth", Parent = 2 });

List<TestQuery> queriesInMemory =
[
    new() { Id = 1, ObjectName = "first" },
    new() { Id = 2, ObjectName = "first" },
    new() { Id = 3, ObjectName = "second" },
    new() { Id = 4, ObjectName = "second" },
    new() { Id = 5, ObjectName = "fourth" },
    new() { Id = 6, ObjectName = "fourth" }
];

context.TestQueries.AddRange(
    new() { ObjectName = "first" },
    new() { ObjectName = "first" },
    new() { ObjectName = "second" },
    new() { ObjectName = "second" },
    new() { ObjectName = "fourth" },
    new() { ObjectName = "fourth" });

await context.SaveChangesAsync();

var actual = await context.TestEntities
    .Select(x => x.Id)
    .SelectMany(pk => context.TestEntities
        .Where(entity => pk == entity.Parent)
        .SelectMany(
            entity => context.TestQueries
                .Where(a => a.ObjectName.Contains(entity.Name))
                .DefaultIfEmpty(),
            (left, right) => new { Left = left, Right = right }
        ))
    .ToListAsync();

var expected = entitiesInMemory
    .Select(x => x.Id)
    .SelectMany(pk => entitiesInMemory
        .Where(entity => pk == entity.Parent)
        .SelectMany(
            entity => queriesInMemory
                .Where(a => a.ObjectName!.Contains(entity.Name!))
                .DefaultIfEmpty(),
            (left, right) => new { Left = left, Right = right }
        ))
    .ToList();

Console.WriteLine($"Expected: {expected.Count}, actual: {actual.Count}");

public class BlogContext : DbContext
{
    public DbSet<TestEntity> TestEntities { get; set; }
    public DbSet<TestQuery> TestQueries { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();
}

public class TestEntity
{
    public int? Id { get; set; }
    public string? Name { get; set; }
    public int? Parent { get; set; }
}

public class TestQuery
{
    public int? Id { get; set; }
    public string? ObjectName { get; set; }
}

@maumar does this ring any bells? Interested in looking into it?

@roji roji reopened this Oct 7, 2024
@roji roji added type-bug and removed closed-no-further-action The issue is closed and no further action is planned. labels Oct 7, 2024
@KvaKoZyaBBrr
Copy link
Author

@roji I tryed resolve this in PR
#34110
Is this may be usefull?

@roji
Copy link
Member

roji commented Oct 7, 2024

Thanks again @KvaKoZyaBBrr, we'll take a look! The PR definitely looks interesting.

@maumar maumar self-assigned this Oct 8, 2024
@dombaigabor

This comment has been minimized.

@roji

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants