Set splitsize for hadoop InputFormat to Presto max_split_size #23635

agrawalreetika · 2024-09-12T12:54:56Z

Description

Set splitsize for hadoop InputFormat to Presto max_split_size
Details in #23608

Motivation and Context

Make splitsize configurable where hadoop InputForma library is used for split generation.
Resolves #23608

Impact

Make splitsize configurable where hadoop InputForma library is used for split generation.
Resolves #23608

Test Plan

Contributor checklist

Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

== NO RELEASE NOTE ==

tdcmeehan · 2024-09-12T12:57:55Z

Any way to add a test case for this?

elharo

Tests?

elharo · 2024-09-12T20:08:04Z

presto-hive/src/main/java/com/facebook/presto/hive/StoragePartitionLoader.java

@@ -93,6 +94,7 @@
 import static java.lang.Math.max;
 import static java.lang.String.format;
 import static java.util.Objects.requireNonNull;
+import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MINSIZE;


This feels like a large dependency to add just for a constant string. Consider using the literal value instead.

agrawalreetika · 2024-09-13T19:06:24Z

Sure, I will check on how we can add a test for this.

agrawalreetika · 2024-09-19T15:19:55Z

Sure, I will check on how we can add a test for this.

Currently added tests are mainly checking how split generation is affected if we use the Hadoop library directly.

agrawalreetika requested a review from a team as a code owner September 12, 2024 12:54

agrawalreetika requested a review from presto-oss September 12, 2024 12:54

elharo reviewed Sep 12, 2024

View reviewed changes

agrawalreetika force-pushed the config-split-size branch from 6dfa658 to c81d8f7 Compare September 19, 2024 14:08

Set splitsize for hadoop InputFormat to Presto max_split_size

ebb04e8

agrawalreetika force-pushed the config-split-size branch from c81d8f7 to ebb04e8 Compare September 19, 2024 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set splitsize for hadoop InputFormat to Presto max_split_size #23635

Set splitsize for hadoop InputFormat to Presto max_split_size #23635

agrawalreetika commented Sep 12, 2024

tdcmeehan commented Sep 12, 2024

elharo left a comment

elharo Sep 12, 2024

agrawalreetika commented Sep 13, 2024

agrawalreetika commented Sep 19, 2024

Set splitsize for hadoop InputFormat to Presto max_split_size #23635

Are you sure you want to change the base?

Set splitsize for hadoop InputFormat to Presto max_split_size #23635

Conversation

agrawalreetika commented Sep 12, 2024

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

tdcmeehan commented Sep 12, 2024

elharo left a comment

Choose a reason for hiding this comment

elharo Sep 12, 2024

Choose a reason for hiding this comment

agrawalreetika commented Sep 13, 2024

agrawalreetika commented Sep 19, 2024