Feature request: volume anomalies pre-aggregate data before computing statistics #1278
Closed
garfieldthesam
started this conversation in
Product and features
Replies: 1 comment
-
Closing and turning this into an issue instead |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I need to run volume anomaly tests on very large tables. However, I cannot performantly do so because the compiled query does not pre-aggregate the row count data. For example this is what the first CTE looks like for a test run on my Databricks cluster:
For large tables (especially ones that are very wide), this is prohibitively expensive for 2 reasons:
select *
fetches all table columns, which isn't necessary in principle for creating a time series; this is especially costly for columnar databasesAs a result we've had to create a cumbersome system where we create derived data quality metrics tables summarizing the large table's metrics each day, and then run elementary
column
tests on those.I'd like to request a rearchitecture of how the volume anomalies code works to improve performance for large tables.
Beta Was this translation helpful? Give feedback.
All reactions