Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] add support for Azure appendable blobs #45

Closed

Conversation

@vilterp
Copy link
Author

vilterp commented Sep 12, 2023

Locally I'm getting a Bad Request back from Azurite, sadly with no error message 😕

 caused by: HTTP.Exceptions.StatusError(400, "PUT", "/devstoreaccount1/jl-azurite-9952/my_append_blob?comp=block", HTTP.Messages.Response:
  """
  HTTP/1.1 400 Bad Request
  Server: Azurite-Blob/3.24.0
  Date: Tue, 12 Sep 2023 22:42:54 GMT
  Connection: keep-alive
  Keep-Alive: timeout=5
  Content-Length: 0
  
  """)
  Stacktrace:
   [1] (::HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{CloudBase.var"#cloudsign#64"{CloudBase.var"#cloudsign#63#65"{typeof(HTTP.StreamRequest.streamlayer)}}}})(stream::HTTP.Streams.Stream{HTTP.Messages.Response, HTTP.Connections.Connection{OpenSSL.SSLStream}}; status_exception::Bool, timedout::ConcurrentUtilities.TimedOut{HTTP.Messages.Response}, logerrors::Bool, logtag::Nothing, kw::Base.Pairs{Symbol, Any, NTuple{8, Symbol}, NamedTuple{(:iofunction, :decompress, :azure, :aws, :awsv2, :require_ssl_verification, :verbose, :credentials), Tuple{Nothing, Nothing, Bool, Bool, Bool, Bool, Int64, CloudBase.AzureCredentials}}})
     @ HTTP.ExceptionRequest ~/.julia/packages/HTTP/Y2JKB/src/clientlayers/ExceptionRequest.jl:19

Maybe it's missing a request header like Content-Length or something? I was hoping Azure.put would take care of that… 🤔

@vilterp vilterp marked this pull request as ready for review September 13, 2023 14:55
@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.14% 🎉

Comparison is base (328b427) 83.13% compared to head (18d0dc3) 83.27%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #45      +/-   ##
==========================================
+ Coverage   83.13%   83.27%   +0.14%     
==========================================
  Files           7        7              
  Lines         587      592       +5     
==========================================
+ Hits          488      493       +5     
  Misses         99       99              
Files Changed Coverage Δ
src/blobs.jl 90.19% <100.00%> (+1.06%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@Drvi Drvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Pete, I like the idea! A couple of questions:

  • What should happen if you put without appending to appendable blob? Would be nice to mention in a comment / docstring.
  • How can the user know that a blob is appendable? Is that something a head would tell you?

# https://learn.microsoft.com/en-us/rest/api/storageservices/append-block
function append_block(c::Container, key::String, data::AbstractVector{UInt8}; kw...)
url = API.makeURL(c, key)
Azure.put(url, [], data; query=Dict("comp" => "appendblock"), kw...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ideally, we'd utilize the parallel multipart upload for larger files, as we do in API.putObjectImpl. This would need some tweaking because currently, API.uploadPart sets "comp" => "block" (couple of lines above in this file), so we'd probably want to propagate the query to it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main use case for appendable blobs vs block blobs (and certainly my use case) is to essentially use blobs as log storage for an ongoing process — I.e. you're appending events every couple seconds, over the course of minutes or hours, while being able to read back the whole thing at any time.

This could be used to build log storage for e.g. a cloud CI system, or (in our case) a cloud database system 😛

So anyway, I don't think we'd really benefit from parallelism here — since if you already have all the data up front, you can just use parallel multipart upload for a normal block blob.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you need to append every couple of seconds or minutes say 32MiB chunks of data at a time, then you'd still benefit from multipart uploads, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose there could be big spikes in the log stream rate, but in my use case I haven't seen spikes big enough to necessitate uploading in parallel.

Looking at the Azure docs, I'm not sure append blobs support parallel appending: https://learn.microsoft.com/en-us/rest/api/storageservices/append-block?tabs=azure-ad#remarks As far as I can tell, once you make an append blob you can only append one at a time. Otherwise there'd have to be some scheme to tell it what block id you're appending; I don't see that in these docs.

@@ -72,6 +72,16 @@ end
delete(x::Container, key::String; kw...) = Azure.delete(API.makeURL(x, key); kw...)
delete(x::Object; kw...) = delete(x.store, x.key; credentials=x.credentials, kw...)

function create_append_blob(c::Container, key::String; kw...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we make this functionality available via API.putObject(...; append=true) instead of create_append_blob / API.put(...; append=true) instead of append_block? It would keep similar functionality under similar API, but I'm open to alternatives, especially if there is a precedent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vilterp
Copy link
Author

vilterp commented Sep 26, 2023

No longer planning on using this; will resurrect if needed

@vilterp vilterp closed this Sep 26, 2023
@vilterp vilterp deleted the add-azure-appendable-blobs branch September 26, 2023 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants