forked from apache/tez
-
Notifications
You must be signed in to change notification settings - Fork 1
/
BUILDING.txt
169 lines (121 loc) · 6.94 KB
/
BUILDING.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
Build instructions for Tez
For instructions on how to contribute to Tez, refer to:
https://cwiki.apache.org/confluence/display/TEZ
----------------------------------------------------------------------------------
Requirements:
* JDK 1.8+
* Maven 3.1 or later
* Findbugs 2.0.2 or later (if running findbugs)
* ProtocolBuffer 2.5.0
* Internet connection for first build (to fetch all dependencies)
* Hadoop version should be 2.7.0 or higher.
----------------------------------------------------------------------------------
Maven main modules:
tez................................(Main Tez project)
- tez-api .....................(Tez api)
- tez-common ..................(Tez common)
- tez-runtime-internals .......(Tez runtime internals)
- tez-runtime-library .........(Tez runtime library)
- tez-mapreduce ...............(Tez mapreduce)
- tez-dag .....................(Tez dag)
- tez-examples ................(Tez examples)
- tez-plugins .................(Tez plugins)
- tez-tests ...................(Tez tests and additional test examples)
- tez-dist ....................(Tez dist)
- tez-ui ......................(Tez web user interface)
----------------------------------------------------------------------------------
Maven build goals:
* Clean : mvn clean
* Compile : mvn compile
* Run tests : mvn test
* Create JAR : mvn package
* Run findbugs : mvn compile findbugs:findbugs
* Run checkstyle : mvn compile checkstyle:checkstyle
* Install JAR in M2 cache : mvn install
* Deploy JAR to Maven repo : mvn deploy
* Run clover : mvn test -Pclover [-Dclover.license=${user.home}/clover.license]
* Run Rat : mvn apache-rat:check
* Build javadocs : mvn javadoc:javadoc
* Build distribution : mvn package[-Dhadoop.version=2.7.0]
* Visualize state machines : mvn compile -Pvisualize -DskipTests=true
Build options:
* Use -Dpackage.format to create distributions with a format other than .tar.gz (mvn-assembly-plugin formats).
* Use -Dclover.license to specify the path to the clover license file
* Use -Dhadoop.version to specify the version of hadoop to build tez against
* Use -Dprotoc.path to specify the path to protoc
Tests options:
* Use -DskipTests to skip tests when running the following Maven goals:
'package', 'install', 'deploy' or 'verify'
* -Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,....
* -Dtest.exclude=<TESTCLASSNAME>
* -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java
----------------------------------------------------------------------------------
Building against a specific version of hadoop:
Tez runs on top of Apache Hadoop YARN and requires hadoop version 2.7.0 or higher.
By default, it can be compiled against other compatible hadoop versions by just
specifying the hadoop.version. For example, to build tez against hadoop 3.0.0-SNAPSHOT
$ mvn package -Dhadoop.version=3.0.0-SNAPSHOT
To skip Tests and java docs
$ mvn package -Dhadoop.version=3.0.0-SNAPSHOT -DskipTests -Dmaven.javadoc.skip=true
However, to build against hadoop versions higher than 2.7.0, you will need to do the
following:
For Hadoop version X where X >= 2.8.0
$ mvn package -Dhadoop.version=${X} -Phadoop28 -P\!hadoop27
For recent versions of Hadoop (which do not bundle aws and azure by default),
you can bundle AWS-S3 (2.7.0+) or Azure (2.7.0+) support:
$ mvn package -Dhadoop.version=${X} -Paws -Pazure
Tez also has some shims to provide version-specific implementations for various APIs.
For more details, please refer to https://cwiki.apache.org/confluence/display/TEZ/HadoopShims
----------------------------------------------------------------------------------
UI build issues:
In case of issue with UI build, please clean the UI cache.
$ mvn clean -PcleanUICache
Issue with PhantomJS on building in PowerPC.
Official PhantomJS binaries were not available for Power platform. Hence if the build fails in PPC
please try installing PhantomJS manually and rerun. Refer https://github.com/ibmsoe/phantomjs-1/blob/v2.1.1-ppc64/README.md
and install it globally for the build to work.
----------------------------------------------------------------------------------
Protocol Buffer compiler:
The version of Protocol Buffer compiler, protoc, must be 2.5.0 and match the
version of the protobuf JAR.
If you have multiple versions of protoc in your system, you can set in your
build shell the PROTOC_PATH environment variable to point to the one you
want to use for the Tez build. If you don't define this environment variable,
protoc is looked up in the PATH.
You can also specify the path to protoc while building using -Dprotoc.path
$ mvn package -DskipTests -Dprotoc.path=/usr/local/bin/protoc
----------------------------------------------------------------------------------
Building the docs:
The following commands will build a local copy of the Apache Tez website under docs
$ cd docs; mvn site
----------------------------------------------------------------------------------
Building components separately:
If you are building a submodule directory, all the Tez dependencies this
submodule has will be resolved as all other 3rd party dependencies. This is,
from the Maven cache or from a Maven repository (if not available in the cache
or the SNAPSHOT 'timed out').
An alternative is to run 'mvn install -DskipTests' from Tez source top
level once; and then work from the submodule. Keep in mind that SNAPSHOTs
time out after a while, using the Maven '-nsu' will stop Maven from trying
to update SNAPSHOTs from external repos.
----------------------------------------------------------------------------------
Visualize the State Machines used in Tez internals:
Use -Pvisualize to generate a graphviz file named Tez.gv which can then be
converted into a state machine diagram that represents the state transitions of
the state machine for the classses provided.
Optional parameters:
* -Dtez.dag.state.classes=<comma-separated list of classes>
- By default, all 4 state machines - DAG, Vertex, Task and TaskAttempt are generated.
* -Dtez.graphviz.title
- Title for the Graph ( Default is Tez )
* -Dtez.graphviz.output.file
- Output file to be generated with the state machines ( Default is Tez.gv )
For example, to generate the state machine graphviz file for DAGImpl, run:
$ mvn compile -Pvisualize -Dtez.dag.state.classes=org.apache.tez.dag.app.dag.impl.DAGImpl -DskipTests=true
To generate the diagram, you can use a Graphviz application or something like:
$ dot -Tpng -o Tez.png Tez.gv'
----------------------------------------------------------------------------------
Building contrib tools under tez-tools :
Use -Ptools to build various contrib tools present under tez-tools. For example, run:
$ mvn package -Ptools
----------------------------------------------------------------------------------