-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Apply row limit transform to query in backend #1461
base: master
Are you sure you want to change the base?
Conversation
rowLimit, | ||
engine | ||
); | ||
await checkUnlimitedQuery(sampledQuery, rowLimit, engine); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about still keeping the old logic here, but only replace the transformLimitedQuery
with the backend api, like sampledQuery does? you probably thought about this, what is the concern?
btw, I think we should keep the logic consistent for both the sampling and limited query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had two thoughts:
- There would be three API calls required to prep a query before sending it to be executed, seemed like too much. I was thinking about creating a single unified endpoint that combined templating/sampling/limiting, but that was even more changes
- Scheduled DataDocs don't go through that flow, so it doesn't make it simpler; it just adds a new endpoint purely for adhoc executions
That said, I'm fine with refactoring to use a new endpoint for this. If it returned a flag for unlimited queries, then we can remove even more frontend code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you raised a good point. I was also thinking of merging all the transforms into one single API endpoint. maybe we could try that route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I combined sampling/limiting into POST /query/transform/
, but left the individual endpoints—sampling preview uses /query/transform/sampling/
, and there may be some use for a standalone limited transform as well.
} | ||
|
||
if (rowLimit != null && rowLimit >= 0) { | ||
return getLimitedQuery(query, rowLimit, engine.language); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may delete the function getLimitedQuery
, which is not needed
f3fcd1f
to
b8a33e0
Compare
b8a33e0
to
1164dde
Compare
@baumandm thanks for the PR and all the updates! lgtm. Just one more question: with this change, will all the scheduled docs automatically have the limit enabled as most of the query cells have the default limit? Dont want to surprise the users |
Good question, that was something I thought about as well. Previously, DataDoc cells didn't store the This PR includes a change so that new DataDoc cells will be initialized with the default row limit from the frontend, to avoid this kind of situation in the future. If this change is something you are worried about, the only thing I can think of would be to run a SQL command to unset every DataCell's For us, since it only applies to |
how about we add a flag in the doc schedule modal, people can explicitly choose to enable or disable query limiting or sampling? by default it is disabled. |
Does this sound correct:
|
that sounds perfectly right! for the 3rd one, I dont have a strong opinion, may start with disabled by default |
Any chance this will be merged soon? |
I believe we ran into an issue where in some scenarios, running a query through Sqlglot made breaking changes in the query, instead of just adding a limit. So we put it on the back burner and got distracted with some other priorities. Our plan is to return to this feature and make it optional as discussed above. That would provide a workaround for any of queries that Sqlglot mangles. I don't have a timeline for this but I'll try to refactor the PR and push an update. |
This PR addresses the issue whereby users will create and test a DataDoc manually with automatic row limits applied, but then schedule the DataDoc and it runs without the automatic limits.
If the Query Engine has the
(Experimental) Enable Row Limit
feature enabled, all row limit transforms are now performed in the backend, for both adhoc and scheduled queries. If this feature is disabled, nothing changes.Summary of changes:
POST /query_execution
now accepts an optionalrow_limit
param and applies a limit transform if configured; this is provided for adhoc executions from either the adhoc editor or a DataDoc cellrun_datadoc
now looks up the DataDoc cell metadata, and applies a limit transform if configuredrun_datadoc
task, so it behaves the same as scheduled DataDocsYour SELECT query is unbounded
popup; this feature continues to work as before, allowing the user to confirm or cancel the execution. The original query is sent to/query_execution
along with therow_limit
admin_logic.get_engine_feature_param()
method to make it easier to get a single paramI considered a number of different approaches for this PR, and picked this approach because it required the fewest, simplest changes, but I can refactor if needed. Happy to discuss!