We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make blockSize configurable for Symlink Tables Code Path
Split generation for the Symlink table is handled via the Hadoop library here
Currently, s3 default block size is 32MB here which is not configurable. And for the hdfs file system default block size is 128MB as mentioned here
As in FileInputFormat split size is calculated based on these computeSplitSize(goalSize, minSize, blockSize)
So it would be better to make this splitSize configurable even from Presto.
Setting org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MINSIZE to Presto property getMaxSplitSize(session).toBytes() in Symlink table configuration block
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MINSIZE
getMaxSplitSize(session).toBytes()
Configuration configuration = targetFilesystem.getConf(); configuration.set(SPLIT_MINSIZE, Long.toString(getMaxSplitSize(session).toBytes()));
presto-hive
Adding sample results with tpc-ds query with sf1k data on s3. For s3 default block size is 32MB here
Here Base run is for s3 default block size is 32MB. Target run is for s3 default block size is 256MB.
The text was updated successfully, but these errors were encountered:
agrawalreetika
Successfully merging a pull request may close this issue.
Make blockSize configurable for Symlink Tables Code Path
Expected Behavior or Use Case
Split generation for the Symlink table is handled via the Hadoop library here
Currently, s3 default block size is 32MB here which is not configurable.
And for the hdfs file system default block size is 128MB as mentioned here
As in FileInputFormat split size is calculated based on these computeSplitSize(goalSize, minSize, blockSize)
So it would be better to make this splitSize configurable even from Presto.
Setting
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MINSIZE
to Presto propertygetMaxSplitSize(session).toBytes()
in Symlink table configuration blockPresto Component, Service, or Connector
presto-hive
Possible Implementation
Setting
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MINSIZE
to Presto propertygetMaxSplitSize(session).toBytes()
in Symlink table configuration blockExample Screenshots (if appropriate):
Adding sample results with tpc-ds query with sf1k data on s3. For s3 default block size is 32MB here
Here Base run is for s3 default block size is 32MB.
Target run is for s3 default block size is 256MB.
Context
The text was updated successfully, but these errors were encountered: