[Do Not Review] Set splitsize for hadoop InputFormat to Presto max_split_size #23635

agrawalreetika · 2024-09-12T12:54:56Z

Description

Set splitsize for hadoop InputFormat to Presto max_split_size
Details in #23608

Motivation and Context

Make splitsize configurable where hadoop InputForma library is used for split generation.
Resolves #23608

Impact

Make splitsize configurable where hadoop InputForma library is used for split generation.
Resolves #23608

Test Plan

Contributor checklist

Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

== NO RELEASE NOTE ==

tdcmeehan · 2024-09-12T12:57:55Z

Any way to add a test case for this?

elharo

Tests?

elharo · 2024-09-12T20:08:04Z

presto-hive/src/main/java/com/facebook/presto/hive/StoragePartitionLoader.java

@@ -93,6 +94,7 @@
 import static java.lang.Math.max;
 import static java.lang.String.format;
 import static java.util.Objects.requireNonNull;
+import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MINSIZE;


This feels like a large dependency to add just for a constant string. Consider using the literal value instead.

agrawalreetika · 2024-09-13T19:06:24Z

Sure, I will check on how we can add a test for this.

agrawalreetika · 2024-09-19T15:19:55Z

Sure, I will check on how we can add a test for this.

Currently added tests are mainly checking how split generation is affected if we use the Hadoop library directly.

agrawalreetika requested a review from a team as a code owner September 12, 2024 12:54

agrawalreetika requested a review from presto-oss September 12, 2024 12:54

elharo reviewed Sep 12, 2024

View reviewed changes

agrawalreetika force-pushed the config-split-size branch 2 times, most recently from c81d8f7 to ebb04e8 Compare September 19, 2024 15:16

agrawalreetika force-pushed the config-split-size branch 3 times, most recently from 79e6452 to a7bc015 Compare September 24, 2024 12:06

elharo previously approved these changes Sep 24, 2024

View reviewed changes

agrawalreetika dismissed elharo’s stale review via cabf1a9 September 25, 2024 07:08

agrawalreetika force-pushed the config-split-size branch 7 times, most recently from 02acbb6 to 9f646e1 Compare September 27, 2024 13:45

agrawalreetika changed the title ~~Set splitsize for hadoop InputFormat to Presto max_split_size~~ [Do Not Review] Set splitsize for hadoop InputFormat to Presto max_split_size Sep 27, 2024

agrawalreetika force-pushed the config-split-size branch 9 times, most recently from 19bef1c to 22dc0d8 Compare September 28, 2024 08:55

agrawalreetika force-pushed the config-split-size branch from 22dc0d8 to 530dba4 Compare September 28, 2024 11:24

Set splitsize for hadoop InputFormat to Presto max_split_size

2334207

agrawalreetika force-pushed the config-split-size branch from 530dba4 to 2334207 Compare September 28, 2024 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do Not Review] Set splitsize for hadoop InputFormat to Presto max_split_size #23635

[Do Not Review] Set splitsize for hadoop InputFormat to Presto max_split_size #23635

agrawalreetika commented Sep 12, 2024

tdcmeehan commented Sep 12, 2024

elharo left a comment

elharo Sep 12, 2024

agrawalreetika commented Sep 13, 2024

agrawalreetika commented Sep 19, 2024

[Do Not Review] Set splitsize for hadoop InputFormat to Presto max_split_size #23635

Are you sure you want to change the base?

[Do Not Review] Set splitsize for hadoop InputFormat to Presto max_split_size #23635

Conversation

agrawalreetika commented Sep 12, 2024

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

tdcmeehan commented Sep 12, 2024

elharo left a comment

Choose a reason for hiding this comment

elharo Sep 12, 2024

Choose a reason for hiding this comment

agrawalreetika commented Sep 13, 2024

agrawalreetika commented Sep 19, 2024