Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Hive webhdfs table #957

Closed
ebyhr opened this issue Jun 11, 2019 · 7 comments
Closed

Add support for Hive webhdfs table #957

ebyhr opened this issue Jun 11, 2019 · 7 comments

Comments

@ebyhr
Copy link
Member

ebyhr commented Jun 11, 2019

Currently, we need to add jars manually to access a table using webhdfs. When I tested past, I added javax.ws.rs-api and jersey-common. Also, I confirmed jersey-bundle resolved the dependency error.
cc: @wyukawa

Related prestodb/presto#6697

@electrum
Copy link
Member

@ebyhr It looks like the JAX-RS API is the only thing missing (I couldn't find any usages of Jersey). Can you try with version 3.2.0-3-SNAPSHOT of hadoop-apache? If that doesn't work, please send me a stack trace so I can see where the reference is coming from.

@ebyhr
Copy link
Member Author

ebyhr commented Jun 11, 2019

This is the stack trace with trinodb/trino-hadoop-apache#8. I think we also need to upgrade hadoop library in addition to missing dependency after this JIRA ticket is resolved (https://issues.apache.org/jira/browse/HDFS-14466). It's related to #620.

First attempt

2019-06-11T16:54:07.103+0900	WARN	hive-hive-3	io.prestosql.plugin.hive.util.ResumableTasks	ResumableTask completed exceptionally
java.lang.ExceptionInInitializerError
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.jsonParse(WebHdfsFileSystem.java:469)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:494)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:135)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:745)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:820)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:648)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:686)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:682)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1633)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910)
	at org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:2072)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2071)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2054)
	at org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:283)
	at io.prestosql.plugin.hive.CachingDirectoryLister.list(CachingDirectoryLister.java:81)
	at io.prestosql.plugin.hive.util.HiveFileIterator$FileStatusIterator.<init>(HiveFileIterator.java:134)
	at io.prestosql.plugin.hive.util.HiveFileIterator$FileStatusIterator.<init>(HiveFileIterator.java:122)
	at io.prestosql.plugin.hive.util.HiveFileIterator.getLocatedFileStatusRemoteIterator(HiveFileIterator.java:111)
	at io.prestosql.plugin.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:104)
	at io.prestosql.plugin.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:38)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
	at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
	at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:274)
	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:100)
	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:200)
	at io.prestosql.plugin.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)
	at io.prestosql.plugin.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)
	at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)
	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.sun.ws.rs.ext.RuntimeDelegateImpl
	at io.prestosql.hadoop.$internal.javax.ws.rs.ext.RuntimeDelegate.findDelegate(RuntimeDelegate.java:122)
	at io.prestosql.hadoop.$internal.javax.ws.rs.ext.RuntimeDelegate.getInstance(RuntimeDelegate.java:91)
	at io.prestosql.hadoop.$internal.javax.ws.rs.core.MediaType.<clinit>(MediaType.java:44)
	... 42 more
Caused by: java.lang.ClassNotFoundException: com.sun.ws.rs.ext.RuntimeDelegateImpl
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at io.prestosql.server.PluginClassLoader.loadClass(PluginClassLoader.java:80)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:264)
	at io.prestosql.hadoop.$internal.javax.ws.rs.ext.FactoryFinder.newInstance(FactoryFinder.java:62)
	at io.prestosql.hadoop.$internal.javax.ws.rs.ext.FactoryFinder.find(FactoryFinder.java:155)
	at io.prestosql.hadoop.$internal.javax.ws.rs.ext.RuntimeDelegate.findDelegate(RuntimeDelegate.java:105)
	... 44 more

The 2nd attempt

2019-06-11T16:54:36.219+0900	WARN	hive-hive-4	io.prestosql.plugin.hive.util.ResumableTasks	ResumableTask completed exceptionally
java.lang.NoClassDefFoundError: Could not initialize class io.prestosql.hadoop.$internal.javax.ws.rs.core.MediaType
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.jsonParse(WebHdfsFileSystem.java:469)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:494)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:135)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:745)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:820)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:648)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:686)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:682)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1633)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910)
	at org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:2072)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2071)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2054)
	at org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:283)
	at io.prestosql.plugin.hive.CachingDirectoryLister.list(CachingDirectoryLister.java:81)
	at io.prestosql.plugin.hive.util.HiveFileIterator$FileStatusIterator.<init>(HiveFileIterator.java:134)
	at io.prestosql.plugin.hive.util.HiveFileIterator$FileStatusIterator.<init>(HiveFileIterator.java:122)
	at io.prestosql.plugin.hive.util.HiveFileIterator.getLocatedFileStatusRemoteIterator(HiveFileIterator.java:111)
	at io.prestosql.plugin.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:104)
	at io.prestosql.plugin.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:38)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
	at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
	at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:274)
	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:100)
	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:200)
	at io.prestosql.plugin.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)
	at io.prestosql.plugin.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)
	at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)
	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

@findepi
Copy link
Member

findepi commented Jun 11, 2019

Besides trinodb/trino-hadoop-apache#8, in this issue we should add product tests.
It's too easy to break integration like this if we're not test-covered.

@electrum
Copy link
Member

Thanks, I see what is going on. That API JAR actually does need Jersey which has the implementation (which is why I didn’t see it when looking at the code).

@electrum
Copy link
Member

@ebyhr I just published another snapshot, can you try again?

@findepi I completely agree we should add product tests. Since the Docker images already use WebHDFS, hopefully that is easy. Does someone have time to work on that?

@ebyhr
Copy link
Member Author

ebyhr commented Jun 12, 2019

Hmm... still get below error

2019-06-12T20:15:06.907+0900	WARN	hive-hive-1	io.prestosql.plugin.hive.util.ResumableTasks	ResumableTask completed exceptionally
java.lang.ExceptionInInitializerError
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.jsonParse(WebHdfsFileSystem.java:469)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:958)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:821)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:648)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:686)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:682)
	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1633)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910)
	at org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:2072)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2071)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2054)
	at org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:283)
	at io.prestosql.plugin.hive.CachingDirectoryLister.list(CachingDirectoryLister.java:81)
	at io.prestosql.plugin.hive.util.HiveFileIterator$FileStatusIterator.<init>(HiveFileIterator.java:134)
	at io.prestosql.plugin.hive.util.HiveFileIterator$FileStatusIterator.<init>(HiveFileIterator.java:122)
	at io.prestosql.plugin.hive.util.HiveFileIterator.getLocatedFileStatusRemoteIterator(HiveFileIterator.java:111)
	at io.prestosql.plugin.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:104)
	at io.prestosql.plugin.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:38)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
	at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
	at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:274)
	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:100)
	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:200)
	at io.prestosql.plugin.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)
	at io.prestosql.plugin.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)
	at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)
	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.sun.ws.rs.ext.RuntimeDelegateImpl
	at io.prestosql.hadoop.$internal.javax.ws.rs.ext.RuntimeDelegate.findDelegate(RuntimeDelegate.java:122)
	at io.prestosql.hadoop.$internal.javax.ws.rs.ext.RuntimeDelegate.getInstance(RuntimeDelegate.java:91)
	at io.prestosql.hadoop.$internal.javax.ws.rs.core.MediaType.<clinit>(MediaType.java:44)
	... 40 more
Caused by: java.lang.ClassNotFoundException: com.sun.ws.rs.ext.RuntimeDelegateImpl
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at io.prestosql.server.PluginClassLoader.loadClass(PluginClassLoader.java:80)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:264)
	at io.prestosql.hadoop.$internal.javax.ws.rs.ext.FactoryFinder.newInstance(FactoryFinder.java:62)
	at io.prestosql.hadoop.$internal.javax.ws.rs.ext.FactoryFinder.find(FactoryFinder.java:155)
	at io.prestosql.hadoop.$internal.javax.ws.rs.ext.RuntimeDelegate.findDelegate(RuntimeDelegate.java:105)
	... 42 more

@ebyhr ebyhr closed this as not planned Won't fix, can't repro, duplicate, stale Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants