Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set splitsize for hadoop InputFormat to Presto max_split_size #23635

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

agrawalreetika
Copy link
Member

Description

Set splitsize for hadoop InputFormat to Presto max_split_size
Details in #23608

Motivation and Context

Make splitsize configurable where hadoop InputForma library is used for split generation.
Resolves #23608

Impact

Make splitsize configurable where hadoop InputForma library is used for split generation.
Resolves #23608

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

@tdcmeehan
Copy link
Contributor

Any way to add a test case for this?

Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests?

@@ -93,6 +94,7 @@
import static java.lang.Math.max;
import static java.lang.String.format;
import static java.util.Objects.requireNonNull;
import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MINSIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a large dependency to add just for a constant string. Consider using the literal value instead.

@agrawalreetika
Copy link
Member Author

Sure, I will check on how we can add a test for this.

@agrawalreetika
Copy link
Member Author

Sure, I will check on how we can add a test for this.

Currently added tests are mainly checking how split generation is affected if we use the Hadoop library directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make blockSize configurable for Symlink Tables Code Path
3 participants