-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to List Blobs in Azure via S3Proxy #717
Comments
I believe that your S3 client is using a marker that was not supplied by the object store. S3 supports any arbitrary string as a marker, for example: S3 has objects [a, b, d, f] However Azure has opaque markers which are not simple strings. Thus if you issue the same set of operations: Azure has objects [a, b, d, f] How have you configured your client such that it is using an unexpected marker? Sometimes clients use the last key instead of the marker while listing but S3Proxy has some fixup logic for this. Also some clients try to list large buckets in parallel using random keys but S3Proxy cannot support this for either Please open a separate issue for the azureblob error. While it will likely have the same marker limitation, I am more likely to fix something if data is not being written properly. Note that many changes have landed to it in the last for weeks so please ensure you use the latest version. |
Thank you for the explanation. In your example above about opaque markers in Azure , this particular example stands out.
Deltalake is attempting to determine what number to supply to the log before it writes data . I believe it is using 0000000000000 so it can get a list and then use the highest number. However in this case, on the first write operation, there is nothing in the directory at all yet, so a invalid marker error is thrown. Does this line up with your explanation ? As for the separate issue, I'll write it up using the azureblob provider with Azurite as a emulator ( works perfectly there since azurite treats all invalid markers as a empty string) . Get the file sizes that were written and then compare to the azurblob-sdk provider with the same emulator , and then the difference when pointing towards the real Azure Blob Storage. |
Lastly , I have no control over the underlying s3 requests that are formulated by deltalake in my Python code as their are encapsulated by the library itself. This library does work perfectly when targeting either S3 storage or when targeting MinIo so I feel like the requests that are being made are correct , but I am honestly unsure. |
I don't mean that your code is incorrect but that the underlying S3 library may be doing an operation that S3Proxy using Azure cannot support. Can you share specifically which library you use and how you call it? |
I'm having similar issue using this for Azure Blob Store, my client isn't using the |
I found my issue fixed by: #569 - hope we can get this merged |
Problem:
Utilizing S3Proxy configured to talk to ABS I cannot use the
marker
parameter when I attempt to ListBlobs. I am using a underlying python library in my codebase calleddeltalake
, where thedeltalake
lib is calling S3Proxy.The incoming request to S3Proxy from my code is :
The transformed request that is sent JCloud and then on to ABS in the cloud is :
Finally the exception caused by the marker parameter is :
You can see above that it basically boils down to this.
Ive looked through both the S3Proxy and the Jclouds code and because the error is so abstract and doesn't tell me why this marker parameter is invalid, Im reaching out for some help.
Im more than happy to do a PR if you can point to where this can be resolved.
I should also say, I attempted using the
azureblob-sdk
provider and while it DOES write data and gets past this error, the data that is written is unable to be read correctly afterwards.deltalake
basically reporting that the file sizes are not what they should be ( smaller ). Which is implying to me that the write operation isn't working correctly even withazureblob-sdk
Environment where failure is seen:
Azure Blob Storage account configured with :
azureSharedKey
in S3Proxyprovider: azureblob
The text was updated successfully, but these errors were encountered: