Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem #341

Merged
merged 6 commits into from
Apr 2, 2024

Conversation

abstractdog
Copy link
Contributor

@abstractdog abstractdog commented Mar 30, 2024

This patch implements the tez side of the scenario described in HIVE-28165 + TEZ-4548.

tez-api changes:

  • extend InputDataInformationEvent with serialized path
  • test coverage for API methods

tez-runtime-internals:

  • adjust proto converters to the new field

tez-mapreduce changes:

  • MRInput to handle serialized path if found in the event
  • basic test coverage for the implementation methods in MRInputHelpers

testing: tested on cluster (together with HIVE-28165), solved an OOM issue, also tez unit tests added

@tez-yetus

This comment was marked as outdated.

@abstractdog abstractdog force-pushed the TEZ-4548 branch 2 times, most recently from c583b70 to bca30d1 Compare March 30, 2024 11:15
@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@abstractdog
Copy link
Contributor Author

@ayushtkn : can you please take a look at this one?

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments

proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null);
ByteBuffer payload = proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null;
InputDataInformationEvent diEvent = null;
if (!proto.getSerializedPath().isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, there maybe a possibility that payload wasn't null as well, but we ignored that because there was a serializedPath?
Should we throw out in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea was: if there is serializedPath, we just use it and don't care about payload, we should have one or another, not both
I think the static InputDataInformationEvent creator methods take care of not having both
is it fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be ok if it can never reach & there is no public method which can reach here, else maybe we can have Preconditions or Assert to see we don't land up having two source of truth accidentally even in future, but I am ok

@abstractdog
Copy link
Contributor Author

thanks for the comments @ayushtkn, addressed them in 5649494

@tez-yetus

This comment was marked as outdated.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @abstractdog for the update. There are some checkstyle warnings can you take care of them, rest things LGTM

@abstractdog
Copy link
Contributor Author

abstractdog commented Apr 2, 2024

Thanx @abstractdog for the update. There are some checkstyle warnings can you take care of them, rest things LGTM

thanks!
I addressed what I could have (renaming some fields)
8b5a97c

I cannot address [HiddenField], I consider it as noise now

let me know if it's fine

@ayushtkn
Copy link
Member

ayushtkn commented Apr 2, 2024

I cannot address [HiddenField], I consider it as noise now

@abstractdog that HiddenField should be fixable, it is telling you have localFs already defined above & you are redefining the same variable name in the test method, rather than using the class variable

diff --git a/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java b/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
index c97158d14..a7501e8ae 100644
--- a/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
+++ b/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
@@ -217,7 +217,6 @@ public class TestMRInputHelpers {
   public void testInputEventSerializedPath() throws IOException {
     MRSplitProto proto = MRSplitProto.newBuilder().setSplitBytes(ByteString.copyFrom("splits".getBytes())).build();
 
-    FileSystem localFs = FileSystem.getLocal(conf);
     Path splitsDir = localFs.resolvePath(localTestRootDir);
 
     Path serializedPath = new Path(splitsDir + Path.SEPARATOR + "splitpayload");
@@ -282,7 +281,6 @@ public class TestMRInputHelpers {
 
   @Test(timeout = 5000)
   public void testInputSplitLocalResourceCreationWithDifferentFS() throws Exception {
-    FileSystem localFs = FileSystem.getLocal(conf);
     Path splitsDir = localFs.resolvePath(localTestRootDir);
 
     DataSourceDescriptor dataSource = generateDataSourceDescriptorMapRed(splitsDir);

Just remove that line FileSystem localFs = FileSystem.getLocal(conf); line and it should work

@abstractdog

This comment was marked as outdated.

@abstractdog
Copy link
Contributor Author

abstractdog commented Apr 2, 2024

I cannot address [HiddenField], I consider it as noise now

@abstractdog that HiddenField should be fixable, it is telling you have localFs already defined above & you are redefining the same variable name in the test method, rather than using the class variable

diff --git a/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java b/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
index c97158d14..a7501e8ae 100644
--- a/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
+++ b/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
@@ -217,7 +217,6 @@ public class TestMRInputHelpers {
   public void testInputEventSerializedPath() throws IOException {
     MRSplitProto proto = MRSplitProto.newBuilder().setSplitBytes(ByteString.copyFrom("splits".getBytes())).build();
 
-    FileSystem localFs = FileSystem.getLocal(conf);
     Path splitsDir = localFs.resolvePath(localTestRootDir);
 
     Path serializedPath = new Path(splitsDir + Path.SEPARATOR + "splitpayload");
@@ -282,7 +281,6 @@ public class TestMRInputHelpers {
 
   @Test(timeout = 5000)
   public void testInputSplitLocalResourceCreationWithDifferentFS() throws Exception {
-    FileSystem localFs = FileSystem.getLocal(conf);
     Path splitsDir = localFs.resolvePath(localTestRootDir);
 
     DataSourceDescriptor dataSource = generateDataSourceDescriptorMapRed(splitsDir);

Just remove that line FileSystem localFs = FileSystem.getLocal(conf); line and it should work

LOL, right, I haven't checked it in detail, I thought it was the another checkstyle warning which is usually a noise :) fixed in 9dce61e

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 26m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 prototool 0m 0s prototool was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 6m 13s Maven dependency ordering for branch
+1 💚 mvninstall 12m 44s master passed
+1 💚 compile 2m 21s master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 compile 2m 6s master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 💚 checkstyle 2m 9s master passed
+1 💚 javadoc 2m 2s master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 javadoc 1m 46s master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+0 🆗 spotbugs 1m 18s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 4m 26s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 1m 24s the patch passed
+1 💚 compile 1m 33s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 cc 1m 33s the patch passed
+1 💚 javac 1m 33s the patch passed
+1 💚 compile 1m 23s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 💚 cc 1m 23s the patch passed
+1 💚 javac 1m 23s the patch passed
+1 💚 checkstyle 0m 12s The patch passed checkstyle in tez-api
+1 💚 checkstyle 0m 10s The patch passed checkstyle in tez-runtime-internals
+1 💚 checkstyle 0m 11s tez-mapreduce: The patch generated 0 new + 43 unchanged - 2 fixed = 43 total (was 45)
+1 💚 checkstyle 0m 20s The patch passed checkstyle in tez-dag
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 1m 0s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 javadoc 0m 59s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 💚 findbugs 3m 45s the patch passed
_ Other Tests _
+1 💚 unit 2m 17s tez-api in the patch passed.
+1 💚 unit 0m 37s tez-runtime-internals in the patch passed.
+1 💚 unit 1m 22s tez-mapreduce in the patch passed.
+1 💚 unit 4m 56s tez-dag in the patch passed.
+1 💚 asflicense 0m 40s The patch does not generate ASF License warnings.
83m 13s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/11/artifact/out/Dockerfile
GITHUB PR #341
JIRA Issue TEZ-4548
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool
uname Linux 14c63bf90eaa 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 34bb628
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/11/testReport/
Max. process+thread count 490 (vs. ulimit of 5500)
modules C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/11/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@abstractdog
Copy link
Contributor Author

tests passed green
considering @ayushtkn 's LGTM an approval, will merge this soon in case of no further objections

@abstractdog abstractdog merged commit f080031 into apache:master Apr 2, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants