-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Could ingest_* methods return something #191
Comments
Hi @thenickg , First, thanks for looking into this. As a Big Data service that allows ingestion of massive data loads, we have several considerations here:
Having said that, I agree that the ingestion tags monitoring feature is a nice trick. It too has performance implications (if you dig deeper into how indexing works in Azure Data Explorer, you will see tags cause extents to not be merged, thus reduces the gain from indexing large files). Now, considering all the information above, I can think of several ideas that might help:
What do you think? |
All considerations super important to balance :) Nice to see the monitoring feature is coming up to keep increasing kusto's GA value. With respect to the suggestions,
This would help users manage monitoring in a targeted way, allowing for programmers to submit multiple ingestion, still do other work, and then await the results of all those ingestion with detailed response objects. Combined with an optional ingest-tag and there's a detailed API for monitoring statuses. Even if this doesn't get implemented, just having an ingest_obj would be nice, but a programmer can dream :)
So all those suggestions sound great! In the meantime, for anyone else curious in solving this problem: I'm going to use ingest_from_blob and hold the URL of the blob in memory and use that as the ID to subscribe to the StatusQueue with. |
@thenickg thanks for the input, and for adding your solution for future reference.. I'm going to leave this open for now so that we can consider internally when and what we want to invest in, if if you are interested, you can always submit a PR 😄. |
Is your feature request related to a problem? Please describe.
ingestion_ methods make developer life difficult to write robust code. They currently silently fail even if their ingestion ends up in the ".show ingestion failures". Users have no way of monitoring this as all underlying ids are hidden from the user to even check against that table.
Describe the solution you'd like
have ingestion_ methods return something useful to handle failures programmatically. Ideally mimicking the robust C# api.
Describe alternatives you've considered
I guess KIT?
edit1: So it appears our friend the KIT library has a pattern with ingestion tag monitoring: https://github.com/Azure/azure-kusto-ingestion-tools/blob/a2a256a09a66aacfe9c4756b2b0b457014013c4a/kit/kit/backends/kusto.py#L183
However the problem with ingestion tag monitoring is that it only can identify successful rows, at least with how it's implemented in KIT currently. However, the API that KIT offers seems much more full featured. Why aren't some of KIT's API features available for ingestion? seems like most of KIT's methods lend themselves well to the ingestion library. KIT makes sense standalone for CLI and data schema inferencing, but it has some nice repeated patterns anyone using the ingestion library would eventually need to write
edit2: ehhh kit is pretty tangled up with this manifest business. Possibly have ingest return the url?
azure-kusto-python/azure-kusto-ingest/azure/kusto/ingest/_ingest_client.py
Line 64 in d81a549
The text was updated successfully, but these errors were encountered: