-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read Parquet metadata via suffix requests #5979
Comments
It seems to me that |
I think the API of If it does need a breaking API change, does that mean we shouldn't start a PR until 52.2.0 is tagged and released? |
It is up to you, but I think it would be better to get the PR up and ready (and we can merge it when |
Potentially related: #6002 |
We now have a 53.0.0-dev branch where we are merging PRs with breaking API changes, in case you are still interested in submitting a PR |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In high-latency environments, like in WebAssembly in users' browsers, minimizing the number of sequential requests can be a significant performance improvement. Now that #5222 has been merged,
object_store
now has the ability to fetch a suffix byte range of files. It would be great to be able to integrate this withparquet
to reduce the number of individual requests required.Describe the solution you'd like
It seems that
parquet::arrow::async_reader::MetadataLoader::load
can be refactored to use an initial suffix request instead of needing to know the file size.Or, perhaps, there should be a new
MetadataLoader::load_suffix
method, so that implementations can choose to useload
if they already know the file size, and useload_suffix
if they don't.Related to this, the
ParquetObjectReader::new
API requires anObjectMeta
, which requires knowing the file size. It would be great to be able to constructParquetObjectReader
with only thestore
and theobject_store::path::Path
. (I'm trying to constructParquetObjectReader
with a fake file length, passing my ownArrowReaderMetadata
#5583, but I haven't figured out if that works yet)Describe alternatives you've considered
Make an extra
HEAD
request instead of using suffix requests.Additional context
The text was updated successfully, but these errors were encountered: