Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect count distinct value with join #49854

Closed
kaijianding opened this issue Aug 15, 2024 · 4 comments
Closed

incorrect count distinct value with join #49854

kaijianding opened this issue Aug 15, 2024 · 4 comments
Assignees
Labels
type/bug Something isn't working

Comments

@kaijianding
Copy link
Contributor

kaijianding commented Aug 15, 2024

Steps to reproduce the behavior (Required)

pt is partition key, type is date
spu_id type is bigint

Correct reuslt is: 5651249

SELECT 
    count(distinct spu_id)
FROM 
    t1 a
WHERE a.pt = '2024-08-08'
	AND a.pt >= '2023-12-16'
+------------------------+
| count(DISTINCT spu_id) |
+------------------------+
|                5651249 |
+------------------------+

SELECT 
    count(distinct spu_id)
FROM 
   t1 a
    cross JOIN (
		SELECT date('2024-08-08') as dim_date1
	) d
where a.pt = d.dim_date1
	AND a.pt >= '2023-12-16'
+------------------------+
| count(DISTINCT spu_id) |
+------------------------+
|                3840110 |
+------------------------+

SELECT 
    count(distinct spu_id)
FROM 
    t1 a
    inner JOIN (
		SELECT date('2024-08-08') as dim_date1
	) d
where a.pt = d.dim_date1
	AND a.pt >= '2023-12-16'
+------------------------+
| count(DISTINCT spu_id) |
+------------------------+
|                3840110 |
+------------------------+

Expected behavior (Required)

the result in consistent

Real behavior (Required)

Result is incorrect with join

StarRocks version (Required)

  • You can get the StarRocks version by executing SQL select current_version()
    3.3
@kaijianding kaijianding added the type/bug Something isn't working label Aug 15, 2024
@packy92
Copy link
Contributor

packy92 commented Aug 15, 2024

please provide the explain info of these sqls

@chenminghua8
Copy link
Contributor

chenminghua8 commented Aug 16, 2024

The issue seems to be related to distributed distinct, but it is not easy to reproduce in a single-machine environment.

@chenminghua8
Copy link
Contributor

@kaijianding Execute the following statement and see what the execution plan is:
explain SELECT
count(distinct spu_id)
FROM
t1 a
inner JOIN (
SELECT date('2024-08-08') as dim_date1
) d
where a.pt = d.dim_date1
AND a.pt >= '2023-12-16'

@packy92 packy92 self-assigned this Aug 16, 2024
@stdpain
Copy link
Contributor

stdpain commented Sep 20, 2024

seems closed by #51150

@stdpain stdpain closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants