Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Do Not Review] Set splitsize for hadoop InputFormat to Presto max_split_size #23635

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

agrawalreetika
Copy link
Member

Description

Set splitsize for hadoop InputFormat to Presto max_split_size
Details in #23608

Motivation and Context

Make splitsize configurable where hadoop InputForma library is used for split generation.
Resolves #23608

Impact

Make splitsize configurable where hadoop InputForma library is used for split generation.
Resolves #23608

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

@tdcmeehan
Copy link
Contributor

Any way to add a test case for this?

Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests?

@@ -93,6 +94,7 @@
import static java.lang.Math.max;
import static java.lang.String.format;
import static java.util.Objects.requireNonNull;
import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MINSIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a large dependency to add just for a constant string. Consider using the literal value instead.

@agrawalreetika
Copy link
Member Author

Sure, I will check on how we can add a test for this.

@agrawalreetika agrawalreetika force-pushed the config-split-size branch 2 times, most recently from c81d8f7 to ebb04e8 Compare September 19, 2024 15:16
@agrawalreetika
Copy link
Member Author

Sure, I will check on how we can add a test for this.

Currently added tests are mainly checking how split generation is affected if we use the Hadoop library directly.

@agrawalreetika agrawalreetika force-pushed the config-split-size branch 3 times, most recently from 79e6452 to a7bc015 Compare September 24, 2024 12:06
elharo
elharo previously approved these changes Sep 24, 2024
@agrawalreetika agrawalreetika force-pushed the config-split-size branch 7 times, most recently from 02acbb6 to 9f646e1 Compare September 27, 2024 13:45
@agrawalreetika agrawalreetika changed the title Set splitsize for hadoop InputFormat to Presto max_split_size [Do Not Review] Set splitsize for hadoop InputFormat to Presto max_split_size Sep 27, 2024
@agrawalreetika agrawalreetika force-pushed the config-split-size branch 9 times, most recently from 19bef1c to 22dc0d8 Compare September 28, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make blockSize configurable for Symlink Tables Code Path
3 participants