Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with cache when aggregate #4378

Closed
antoine-conserto opened this issue Sep 5, 2024 · 9 comments
Closed

Bug with cache when aggregate #4378

antoine-conserto opened this issue Sep 5, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@antoine-conserto
Copy link

Hi,

Describe the bug
When using cache on a table that calls an api with field selection and filter, then an aggregation like sum or count disctint, sometimes there is a data inconsistency.

Steampipe version (steampipe -v)
Steampipe v0.23.5

To reproduce

Use this sample with chaos plugin and table chaos_bug_cache_sum : https://github.com/antoine-conserto/steampipe-plugin-chaos/tree/feature/bug-cache-sum

Run steampipe service
Run the query :
SELECT SUM(c.amount) AS "sum", COUNT(distinct id) AS "count" FROM chaos.chaos_bug_cache_sum c

Same problem with steampipe query directly.

Expected behavior

This query should always return 200,000 in sum and 10,000 in distinct count.

expected
database-2024-09-05_ok.log
plugin-2024-09-05_ok.log

Additional context

It is not systematic. Sometimes I have the right result, if this is the case you have to restart the service or clear cache

Sometimes I only got the sum wrong but the distinct count was good.

I don't have the problem if I disable the cache on the table.

It's only the first query when caching that seems problematic. If I rerun after caching, I get the correct result and don't reproduce.

Exemple KO 1
ko_1
database-2024-09-05_ko_1.log
plugin-2024-09-05_ko_1.log

Exemple KO 2
ko_2
database-2024-09-05_ko_2.log
plugin-2024-09-05_ko_2.log

@antoine-conserto antoine-conserto added the bug Something isn't working label Sep 5, 2024
@pskrbasu
Copy link
Contributor

pskrbasu commented Sep 6, 2024

@antoine-conserto Thanks for raising this detailed issue. I'll take a look into it and try to reproduce the issue.

@antoine-conserto
Copy link
Author

Any update ?

@vbatychko-modeln
Copy link

Can confirm that with v0.24.2 and AWS plugin 0.147.0. This select does not give consistent result. Sometimes, 0, sometimes 2 or more.
Disabling cache fixes bug. Note that I'm using aggregation for multiple AWS accounts.

steampipe=> select count(*) from (select count(*), subnet_arn from aws_vpc_subnet group by subnet_arn having count(*)>1) t;
 count
-------
     2
(1 row)

@pskrbasu
Copy link
Contributor

@antoine-conserto @vbatychko-modeln Apologies for the delay. I'll try this out today and update this thread.

@pskrbasu
Copy link
Contributor

@antoine-conserto A small update on this. I was unable to reproduce the bug using your chaos table. I'm always getting the expected (This query should always return 200,000 in sum and 10,000 in distinct count.). I'll keep digging.

Meanwhile, I'll also try using the AWS query @vbatychko-modeln provided.

@antoine-conserto
Copy link
Author

I also retested.

In steampipe v0.24.2
Image

And with v1.0.0
Image

I did not change the sdk version (v5.10.4) on the chaos plugin. Maybe your refactor on the FDW fixed the problem, in any case it no longer seems relevant to me.

Thanks for watching @pskrbasu

@e-gineer
Copy link
Contributor

Nice test results @antoine-conserto ... I think we can close this @pskrbasu?

@vbatychko-modeln
Copy link

Can confirm again, with v1.0.0 of steampipe & aws plugin, I cannot reproduce this bug, thanks.

@pskrbasu
Copy link
Contributor

Thanks @vbatychko-modeln @antoine-conserto for confirming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants