-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] add support for Azure appendable blobs #45
Conversation
Locally I'm getting a Bad Request back from Azurite, sadly with no error message 😕
Maybe it's missing a request header like |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #45 +/- ##
==========================================
+ Coverage 83.13% 83.27% +0.14%
==========================================
Files 7 7
Lines 587 592 +5
==========================================
+ Hits 488 493 +5
Misses 99 99
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Pete, I like the idea! A couple of questions:
- What should happen if you
put
without appending to appendable blob? Would be nice to mention in a comment / docstring. - How can the user know that a blob is appendable? Is that something a
head
would tell you?
# https://learn.microsoft.com/en-us/rest/api/storageservices/append-block | ||
function append_block(c::Container, key::String, data::AbstractVector{UInt8}; kw...) | ||
url = API.makeURL(c, key) | ||
Azure.put(url, [], data; query=Dict("comp" => "appendblock"), kw...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ideally, we'd utilize the parallel multipart upload for larger files, as we do in API.putObjectImpl
. This would need some tweaking because currently, API.uploadPart
sets "comp" => "block"
(couple of lines above in this file), so we'd probably want to propagate the query to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main use case for appendable blobs vs block blobs (and certainly my use case) is to essentially use blobs as log storage for an ongoing process — I.e. you're appending events every couple seconds, over the course of minutes or hours, while being able to read back the whole thing at any time.
This could be used to build log storage for e.g. a cloud CI system, or (in our case) a cloud database system 😛
So anyway, I don't think we'd really benefit from parallelism here — since if you already have all the data up front, you can just use parallel multipart upload for a normal block blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if you need to append every couple of seconds or minutes say 32MiB chunks of data at a time, then you'd still benefit from multipart uploads, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose there could be big spikes in the log stream rate, but in my use case I haven't seen spikes big enough to necessitate uploading in parallel.
Looking at the Azure docs, I'm not sure append blobs support parallel appending: https://learn.microsoft.com/en-us/rest/api/storageservices/append-block?tabs=azure-ad#remarks As far as I can tell, once you make an append blob you can only append one at a time. Otherwise there'd have to be some scheme to tell it what block id you're appending; I don't see that in these docs.
@@ -72,6 +72,16 @@ end | |||
delete(x::Container, key::String; kw...) = Azure.delete(API.makeURL(x, key); kw...) | |||
delete(x::Object; kw...) = delete(x.store, x.key; credentials=x.credentials, kw...) | |||
|
|||
function create_append_blob(c::Container, key::String; kw...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we make this functionality available via API.putObject(...; append=true)
instead of create_append_blob
/ API.put(...; append=true)
instead of append_block
? It would keep similar functionality under similar API, but I'm open to alternatives, especially if there is a precedent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I basically mimicked the Python SDK
- see usage here: https://github.com/Azure/azure-sdk-for-python/blob/91f5a8bc5fb41273b348ee45023a715f26fb79ac/sdk/storage/azure-storage-blob/tests/test_append_blob.py#L63-L65
- docs here: https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.appendblobservice.appendblobservice?view=azure-python-previous#azure-storage-blob-appendblobservice-appendblobservice-append-block
I kind of prefer the explicit names, but idk, it's a matter of taste 🤷♂️ The style of the repo does skew toward kwargs
No longer planning on using this; will resurrect if needed |
See https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-append-blobs