Support for Hadoop 3? #657

theyaa · 2020-02-17T17:39:35Z

Does Dr. Elephant provide support for Hadoop 3 with Yarn ATS V2 please?

ShubhamGupta29 · 2020-03-02T09:18:15Z

@theyaa, No Dr.Elephant currently doesn't support Hadoop3 with ATS v2. But you can use Dr.E with Hadoop3 in prod given that you Yarn REST APIs and history servers are in sync with what Dr.Elephant is excepting.
Kindly try this if you can and let us know the result and reach out in case you need any help.

theyaa · 2020-03-04T02:44:47Z

Hi @ShubhamGupta29, in HDP3 Hadoop3, all hive queries run using the Tez engine. And Tez is built to send query updates/progress to Yarn ATSv2. Using Yarn timeline server v1 rest api, we can not get Tez query progress information anymore. We have to use Yarn ATSv2. Or read from Hive's sys db tables query_data, dag_data.

ShubhamGupta29 · 2020-03-05T15:33:07Z

@theyaa, got the need for ATSv2. I will have a look at all the needs and changes for this requirement and prioritize respectively.

theyaa · 2020-03-06T21:15:32Z

@ShubhamGupta29 thank you very much. Please let me know when you have a working version so I can download and try it out.

shkhrgpt · 2020-03-17T20:43:48Z

@theyaa Is the Tez UI working in your HDP 3 install?
Can you also provide the value of the property, tez.history.logging.service.class, which should be present in tez-site.xml.
Thank you.

theyaa · 2020-03-18T17:52:44Z

Hi @shkhrgpt the value is: org.apache.tez.dag.history.logging.proto.ProtoHistoryLoggingService

shkhrgpt · 2020-03-18T18:35:41Z

@theyaa That may be the issue why the timeline server is not returning data for Tez. org.apache.tez.dag.history.logging.proto.ProtoHistoryLoggingService doesn't allow data to go to timeline server and therefore timeline API used Tez fetcher is not working.

Maybe if you change the value of tez.history.logging.service.class to org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService, it might work. As it's described here:

https://tez.apache.org/tez-ui.html

I haven't tested it yet so I don't know if it causes any problem. But maybe you can try?

shkhrgpt · 2020-03-18T20:58:40Z

@theyaa Did look the solution described here:

#529

theyaa · 2020-03-18T21:28:10Z

Hi @shkhrgpt This will cause issues with Yarn and hive logging since Yarn with Hadoop3 and HDP3 logs to Yarn ATSv2 and the latter uses Protobuf and writes to Hbase. If I switch to the old class for Tez I will loose that logging and cause issues in Yarn. That is why I was asking if there is a way to modify Dr. Elephant to be able to read from Yarn ATSv2.

shkhrgpt · 2020-03-18T22:27:33Z

Okay @theyaa .
Do you know if ATSv2 rest API provides the Tez data which was provided by older ATS rest API?

shkhrgpt · 2020-03-20T00:38:07Z

@theyaa
I wrote a logging service that will write Tez events to both ATSv1 and protobuf. Please check the following if you want to try

https://github.com/shkhrgpt/tez-logging

The goal is that dr elephant should be able to access get data from ATSv1 rest api, and the data should go also be written to protobuf so nothing else.
If you can, then, please try this and let me know if it works for you.

theyaa · 2020-03-25T15:17:11Z

Hi @shkhrgpt Tez+Hive in Hive3 do log all query/dag events to a hive database called sys. Under the sys db, there are 2 tables query_data and dag_data. Those are the main two tables. If you can get Dr. Elephant to read from those two tables, then it will be able to process hive queries the same way as before.

Cloudera has a tools called "Data Analytics Studio" It does exactly this and presents the query in a web user interface. I believe if Dr. Elephant can parse the below 2 tables from hive's sys db, it will be able to perform the same exact way.

query_data
dag_data

ShubhamGupta29 self-assigned this Mar 2, 2020

ShubhamGupta29 added the question label Mar 2, 2020

mareksimunek mentioned this issue Apr 16, 2020

Dr Elephant on Cloudera #678

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Hadoop 3? #657

Support for Hadoop 3? #657

theyaa commented Feb 17, 2020

ShubhamGupta29 commented Mar 2, 2020

theyaa commented Mar 4, 2020

ShubhamGupta29 commented Mar 5, 2020

theyaa commented Mar 6, 2020

shkhrgpt commented Mar 17, 2020

theyaa commented Mar 18, 2020

shkhrgpt commented Mar 18, 2020

shkhrgpt commented Mar 18, 2020

theyaa commented Mar 18, 2020

shkhrgpt commented Mar 18, 2020

shkhrgpt commented Mar 20, 2020

theyaa commented Mar 25, 2020

Support for Hadoop 3? #657

Support for Hadoop 3? #657

Comments

theyaa commented Feb 17, 2020

ShubhamGupta29 commented Mar 2, 2020

theyaa commented Mar 4, 2020

ShubhamGupta29 commented Mar 5, 2020

theyaa commented Mar 6, 2020

shkhrgpt commented Mar 17, 2020

theyaa commented Mar 18, 2020

shkhrgpt commented Mar 18, 2020

shkhrgpt commented Mar 18, 2020

theyaa commented Mar 18, 2020

shkhrgpt commented Mar 18, 2020

shkhrgpt commented Mar 20, 2020

theyaa commented Mar 25, 2020