Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2039] fix(docker): Make docker build script work for Hadoop3.2 #2040

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

EnricoMi
Copy link
Contributor

@EnricoMi EnricoMi commented Aug 13, 2024

What changes were proposed in this pull request?

  • Fixes Docker build script option --hadoop-version
  • Allows to build the Docker image without Hadoop binaries
  • Fixes building the Docker image for Hadoop3.2 to work with Hadoop binaries and Epoll.
  • Add Maven profile netty-4.1.68.Final to downgrade netty version

Why are the changes needed?

Currently, building with option --hadoop-version 3.2.4 does not work, as HADOOP_SHORT_VERSION is not updated when --hadoop-version is set. Further, even if built with Hadoop 3.2.4, using Netty and Epoll currently fails with

Exception in thread "main" java.lang.UnsatisfiedLinkError: failed to load the required native library
	at io.netty.channel.epoll.Epoll.ensureAvailability(Epoll.java:81)
	at io.netty.channel.epoll.EpollEventLoop.<clinit>(EpollEventLoop.java:57)
	at io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:189)
	at io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:37)
	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:84)
	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:60)
	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:49)
	at io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:59)
	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:117)
	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:104)
	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:81)
	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:57)
	at org.apache.uniffle.server.netty.StreamServer.<init>(StreamServer.java:88)
	at org.apache.uniffle.server.ShuffleServer.initialization(ShuffleServer.java:300)
	at org.apache.uniffle.server.ShuffleServer.<init>(ShuffleServer.java:113)
	at org.apache.uniffle.server.ShuffleServer.main(ShuffleServer.java:131)
Caused by: java.lang.ExceptionInInitializerError
	at io.netty.channel.epoll.Epoll.<clinit>(Epoll.java:40)
	... 15 more
Caused by: java.lang.IllegalStateException: Multiple resources found for 'META-INF/native/libnetty_transport_native_epoll_x86_64.so' with different content: [jar:file:/data/rssadmin/rss/jars/server/netty-transport-native-epoll-4.1.109.Final-linux-x86_64.jar!/META-INF/native/libnetty_transport_native_epoll_x86_64.so, jar:file:/data/rssadmin/hadoop/share/hadoop/hdfs/lib/netty-all-4.1.68.Final.jar!/META-INF/native/libnetty_transport_native_epoll_x86_64.so]
	at io.netty.util.internal.NativeLibraryLoader.getResource(NativeLibraryLoader.java:301)
	at io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:173)
	at io.netty.channel.epoll.Native.loadNativeLibrary(Native.java:334)
	at io.netty.channel.epoll.Native.<clinit>(Native.java:96)
	... 16 more

Reason is the Hadoop binaries contain a different netty-all jar then the RSS shuffle server.

This can be fixed by

  • either not include the Hadoop binaries in the Docker image
  • or by downgrading netty version to match the version contained in the Hadoop binaries

Fix: #2039

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually running:

./build.sh --hadoop-version 2.8.5 --hadoop-provided true --push-image false
./build.sh --hadoop-version 2.8.5 --hadoop-provided false --push-image false
./build.sh --hadoop-version 3.2.4 --hadoop-provided true --push-image false
./build.sh --hadoop-version 3.2.4 --hadoop-provided false --push-image false

And testing coordinator and shuffle server manages to start in each of the images.

docker run --rm -it rss-server:0.10.0-SNAPSHOT-release.0.4.0.1030.g7af971ef.dirty /bin/bash -c "rss/bin/start-coordinator.sh && sleep 3 && XMX_SIZE=1g rss/bin/start-shuffle-server.sh && sleep 10"

Note you have to run this before each ./build.sh:

rm rss.tgz rss-0.10.0-SNAPSHOT-hadoop3.2.tgz ../../../rss-0.10.0-SNAPSHOT-hadoop3.2.tgz

Use this conf/server.conf:

rss.rpc.server.port 29999
rss.jetty.http.port 29998
rss.storage.basePath /tmp/rss
rss.storage.type MEMORY_LOCALFILE_HDFS
rss.coordinator.quorum localhost:19999
rss.server.buffer.capacity 40gb
rss.server.read.buffer.capacity 20gb
rss.server.flush.thread.alive 5
rss.server.flush.localfile.threadPool.size 10
rss.server.flush.hadoop.threadPool.size 60
rss.server.disk.capacity 100m
rss.server.single.buffer.flush.enabled true
rss.server.single.buffer.flush.threshold 128m

# Enable Netty mode
rss.rpc.server.type GRPC_NETTY
rss.server.netty.epoll.enable true
rss.server.netty.port 17000
rss.server.netty.connect.backlog 128

@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.22%. Comparing base (5ddcc28) to head (906c551).
Report is 45 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2040      +/-   ##
============================================
+ Coverage     52.77%   53.22%   +0.45%     
- Complexity     2498     3008     +510     
============================================
  Files           398      455      +57     
  Lines         18135    24276    +6141     
  Branches       1660     2280     +620     
============================================
+ Hits           9570    12921    +3351     
- Misses         7981    10511    +2530     
- Partials        584      844     +260     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Test Results

 2 792 files  +1   2 792 suites  +1   5h 51m 45s ⏱️ +12s
   988 tests ±0     987 ✅ +1   1 💤 ±0  0 ❌  - 1 
12 403 runs  +1  12 388 ✅ +2  15 💤 ±0  0 ❌  - 1 

Results for commit 906c551. ± Comparison against base commit 2b95936.

@qijiale76
Copy link
Contributor

@EnricoMi, Thank you for reporting this. We've encountered the same problem.
Providing an option for building the Docker image without Hadoop binaries is very useful.
As for the Netty version issue, I think shading the native Netty like #1409 could potentially be a better long-term solution. It might allow us to use the desired Netty version, possibly maintaining performance and avoiding bugs. I'm in the process of implementing and testing this solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Docker image build script with Hadoop3.2 does not work with Netty and Epoll
3 participants