Add support to run Spark interpreter on a Kubernetes cluster #2637

matyix · 2017-10-31T15:18:28Z

What is this PR for?

The goal of this PR is to be able to execute Spark notebooks on Kubernetes in cluster mode, so that the Spark Driver runs inside Kubernetes cluster - based on https://github.com/apache-spark-on-k8s/spark. Zeppelin uses spark-submit to start RemoteInterpreterServer which is able to execute notebooks on Spark. Kubernetes specific spark-submit parameters like driver, executor, init container, shuffle images should be set in SPARK_SUBMIT_OPTIONS environment variable. In case the Spark interpreter is configured with a K8 Spark specific master url (k8s://https....) RemoteInterpreterServer is launched inside a Spark driver pod on Kubernetes, thus Zeppelin server it has to be able to connect to the remote server. In a Kubernetes cluster the best solution for this is creating a K8S service for RemoteInterpreterServer. This is the reason for having the SparkK8RemoteInterpreterManagerProcess - extending functionality of RemoteInterpreterManagerProcess - which creates the Kubernetes service, mapping the port of RemoteInterpreterServer in Driver pod and connects to this service once Spark Driver pod is in Running state.

Design considerations: As described in spark-interpreter-k8s.md, the Zeppelin Server is running inside the Kubenetes cluster - thus we can choose where to run the Zeppelin server - the benefit of running the server inside K8S is that we don't have to deal with authentication. However is not enough to start only the Zeppelin Server inside the Kubernetes cluster as by default Zeppelin will start spark-submit in the same pod and will run every Spark job locally. The scope of this PR is run to run spark-submit (apache-spark-on-k8s version) properly configured with Docker images etc. so that the Spark driver will be started in a separate pod in the cluster, also staring separate pods for Spark executors thus we can benefit from dynamic scaling of executors inside the Kubernetes cluster (while all the scheduling, pod allocation, resource management is done by the Kubernetes scheduler).

Please see below how is this running/used:

The cluster:

The flow:

What type of PR is it?

Feature

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-3020

How should this be tested?

Unit and functional tests - running notebooks on Spark on K8S.

Questions:

Does the licenses files need update?
Is there breaking changes for older versions?
Does this needs documentation?

zjffdu · 2017-11-01T00:21:07Z

Thanks @matyix for this contribution, could you add some doc to illustrate how to use this feature in zeppelin ?

felixcheung

a couple of comments, could you elaborate on the design choices you have considered?

in this implementation it looks like it's running Zeppelin outside of k8s and connecting to the driver pod running on k8s. I wonder if Zeppelin use cases will be more closely match with the in-cluster client feature in discussion on apache-spark-on-k8s PR # 456 (not linking directly here) - because, in general, it seems to me it makes more sense to run everything on top of k8s.

felixcheung · 2017-11-01T07:09:53Z

bin/interpreter.sh

@@ -205,7 +214,9 @@ if [[ ! -z "$ZEPPELIN_IMPERSONATE_USER" ]]; then
 fi

 if [[ -n "${SPARK_SUBMIT}" ]]; then
-    if [[ -n "$ZEPPELIN_IMPERSONATE_USER" ]] && [[ "$ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER" != "false" ]];  then
+    if [[ -n "${RUN_SPARK_ON_K8}" ]]; then
+       INTERPRETER_RUN_COMMAND+=' '` echo ${SPARK_SUBMIT} --class ${ZEPPELIN_SERVER} ${SPARK_SUBMIT_OPTIONS} --conf spark.app.name=zri-${INTERPRETER_GROUP_NAME} --conf spark.kubernetes.driver.label.interpreter-processId=${INTERPRETER_PROCESS_ID} --conf spark.metrics.namespace=zeppelin_${INTERPRETER_GROUP_NAME} ${SPARK_APP_JAR} ${PORT}`


I think we need to match other mode to include --driver-class-path \"${ZEPPELIN_INTP_CLASSPATH_OVERRIDES}:${ZEPPELIN_INTP_CLASSPATH}\" --driver-java-options \"${JAVA_INTP_OPTS}\" ${SPARK_SUBMIT_OPTIONS} ${ZEPPELIN_SPARK_CONF}

The issue is that these doesn't makes sense for the apache-spark-on-k8s version of spark-submit, beacuse it is launching the Spark driver in a separate pod, so any dependency should be passed via ResourceStagingServer or built into Spark Docker images.

Currently you can't pass additional java options to Spark Driver, the supported options (https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html) can be passed in SPARK_SUBMIT_OPTIONS.

felixcheung · 2017-11-01T07:12:09Z

zeppelin-dockerfiles/Dockerfile

+# limitations under the License.
+#
+
+FROM shipyardlabs/spark-base:zeppelin


I'm sure what's ASF policy on basing Docker image on a 3rd party one?

felixcheung · 2017-11-01T07:12:45Z

zeppelin-dockerfiles/Dockerfile

+# command should be invoked from the top level directory of the Spark distribution. E.g.:
+# docker build -t spark-driver:latest -f dockerfiles/driver/Dockerfile .
+
+COPY zeppelin-distribution/target/zeppelin-0.8.0-SNAPSHOT/zeppelin-0.8.0-SNAPSHOT /opt/zeppelin


I think we should parameterize this to not hardcode a release version number

felixcheung · 2017-11-01T07:17:45Z

zeppelin-dockerfiles/Dockerfile

+
+COPY zeppelin-distribution/target/zeppelin-0.8.0-SNAPSHOT/zeppelin-0.8.0-SNAPSHOT /opt/zeppelin
+
+ADD https://storage.googleapis.com/kubernetes-release/release/v1.7.4/bin/linux/amd64/kubectl /usr/local/bin


is there a particular reason for v1.7, instead of 1.8?
https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-binary-via-curl

felixcheung · 2017-11-01T07:20:27Z

zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterSetting.java

+              && properties.getProperty("master").startsWith("k8s")) {
+        this.launcher = new SparkK8InterpreterLauncher(this.conf);
+      } else {
+        this.launcher = new SparkInterpreterLauncher(this.conf);


@zjffdu - I think we need a fast way to abstract out the different launchers, instead of bringing everything in here (plus artifacts like kubernetes-client). Do you have any thought on that?

Eventually I was thinking about this as a long term solution - applicable not just to this launcher. If there was something like a pluggable/configurable launcher, we don't have to include the k8s client and SparkK8Launcher in core zeppelin code, just implement as a plugabble launcher added to Zeppelin classpath and configure Zeppelin to use it.

felixcheung · 2017-11-01T07:22:08Z

zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/InterpreterSetting.java

    if (group.equals("spark")) {
-      this.launcher = new SparkInterpreterLauncher(this.conf);
+      if (properties.getProperty("master") != null
+              && properties.getProperty("master").startsWith("k8s")) {


I think the proper match should be k8s://
https://github.com/apache-spark-on-k8s/spark/pull/498/files#diff-817de38fee3505bdca2e40ce393857f6R123

This can be changed to properties.getProperty("master").startsWith("k8s://")

matyix · 2017-11-01T22:40:50Z

Hello @zjffdu @felixcheung - I have added docs about how to use/build this PR as well. Also the Dockerfile has been removed, the documentation contains details of using a third party (mine) or build your own Dockerfile - it was added in the repo as a convenience, but due to licensing concerns I did remove it. There was no particular reason why it was 1.7 (at the time I started to work on this 1.8 was not released yet - and I usually waiting for the first patch version until I switch).

matyix · 2017-11-02T09:14:22Z

@zjffdu @felixcheung I have updated the original PR description with design considerations and a typical K8S cluster and Zeppelin flow showing how I am using this feature/PR currently on the https://github.com/apache-spark-on-k8s/spark Spark version.

echarles · 2017-11-10T07:47:03Z

@matyix I have a local spark-k8s setup and have this branch (without success so far, so debugging SparkK8RemoteInterpreterManagedProcess to tackle down the issue). A few questions:

I see that in case of 'k8s://...' master, the 'SparkK8RemoteInterpreterManagedProcess' is first search for a driver pod starting with prefix zri- - Does that mean I have to manually instanciate that pod?
If created one such pod, and for now, if the pod is found, the 'SparkK8RemoteInterpreterManagedProcess' considers at as "remote process" - Actually, you don't create a process via the classical zeppelin 'interpreter.sh' command. Right?
Then, I receive java.net.ConnectException: Connection refused.

Would the classical spark-submit via interpreter.sh not be another option? Did you try it? Here, using that approach, the driver is created but exits directly. I guess apache-spark-on-k8s/spark#402 would help.

echarles · 2017-11-11T12:34:20Z

Just tried the vanilla (so without this #2637 PR) zeppelin on spark-k8s with the in-cluster client mode branch (apache-spark-on-k8s/spark#456).

It works fine out-of-the-box (with the ad-hoc spark interpreter parameters).

@matyix Do you see any reason to further work on this PR? Maybe you want to address additional goals with this?

matyix · 2017-11-12T11:29:32Z

@echarles @zjffdu @felixcheung

It is absolutely makes sense to keep this PR and make further work with it. Just to re-empahsize, the goal is to enable Zeppelin to submit notebooks to a Kubernetes cluster invoking spark-submit in cluster deploy mode.

Please find below a couple of advantages the cluster mode has comparing to the client mode:

currently the cluster client mode appears to have some problems - I faced exactly the same problems what you have described in the PR when running multiple interpreters and I’m not sure if
whether these problems will be resolved and client mode will be supported (I have some PR's on Spark-k8s fork and will catch up with the folks regarding this topic)
in cluster mode you are running Zeppelin server and each RemoteInterpreterServer process (Spark Driver) is running in separate pods which fits better to Kubernetes best practices/patterns (instead of having one monolith RIS)
the latest Spark Driver creates a separate K8S Service to handle Executor --> Driver connections in cluster mode, which again fits better in Kubernetes best practices/patterns
this solution works regardless of Zeppelin server running in/outside of cluster if we add the option to set up authentication info for Zeppelin
it is using spark-submit and interpreter.sh and simplifies a bit the spark-submit command for K8S. Other than this the PR created SparkK8RemoteInterpreterManagedProcess to simplify connection to RemoteInterpreterServer in K8S clusters, so we are using K8S client to look up the Driver pod then create a K8S Service bounded to RemoteInterpreterServer running inside Driver pod
overall this may seem a bit more complicated then client mode however it works better and fits better in Kubernetes cluster best practices/patterns
if you have some ideas about a better way to place this functionality in Zeppelin, please let me know

Overall this is a way better and cleaner approach which fits the K8S ecosystem and at the same time has no side-effect for those not willing to use K8S.

I will update the PR regardless to fix the merge conflicts and add some minor changes/improvements - I am using this PR extensively on a few large K8S clusters and it works/fits our needs on K8S and complies with the our K8S cluster standards/best practices.

echarles · 2017-11-15T07:07:40Z

@matyix I have tested your last commits and was able to make it work in my env (with both zeppelin in- and out k8s cluster).

Your implement a new (specific for spark-k8s) launch and remote executor. In another local branch, I have tried to stick as much as possible to the current zeppelin paradigm (thrift servers both sides of the interpreters processes with CallbackInfo) and 2 parameters (host, port) for interpreter.sh - I still have issue with the callback, so I finally think the approach you propose is good and does the job.

My feedbacks:

The branch as-such need basic updates: I had to fix compilation issue with the new classes (SparkK8sInterpreterLauncher and SparkK8sRemoteIntepreterManagedProcess) and had to add ${ZEPPELIN_SPARK_CONF} in the interpreter.sh script.
To find the running driver pod, you actually poll on regular basis. The ideal would be to be notified when the pod is ready (not sure if the k8s client support this. We would closely map the current mechanism of the thrift notification via the CallbackInfo, but here with a pure k8s mechanism. This could be also extended to other interpreters we would want to see in k8s.
We need to set spark.app.name with must a value starting with zri- - If you don't set this in the interpreter settings, the k8s client will not find the driver pod - I wonder if we can make this more configurable, let's say using metada or simply using the InterpreterContext with contains a properties attributes with all the given props - the launcher could retrieve this and search for a pod starting with a dynamic prefix rather than with this hardcoded one.
The current vanilla zeppelin supports out-of-the-box the spark-k8s client-mode (assuming you are using Add support to run Spark interpreter on a Kubernetes cluster #2637). The condition to use the SparkK8sInterpreterLauncher needs to check for spark.submit.deployMode being cluster and continue to use the normal ManagedProcess for client.
On documentation level, certainly mention that the app name must start with zri-. Also, relying on the kubespark docker image would be better to ensure nothing special is added in the docker image.

WDYT?

Do you prefer me to submit a PR on your PR and will you make another push?

echarles · 2017-11-15T13:49:42Z

PS1: I have pushed my fixes in zeppelin-k8s/tree/spark-interpreter-k8s-fixes branch (which has merge conflict with master due to latest commit 3b1a03f that touch the launchers and remote executors)

PS2: I have opened on PR to document this on the spark-k8s docs repo apache-spark-on-k8s/userdocs/pull/21

matyix · 2017-11-15T19:14:22Z

Hello @echarles, thanks for the feedback, followed up on that and made the suggested changes, please see below:

use watcher instead of polling
set driver prefix form config properties spark.app.name
use sparkK8Launcher only if deploy-mode=cluster
no updates needed to interpreter.sh
added groupId to InterpreterLaunchContext

The branch can be merged as of now - unluckily the master is moving fast. I've seen your comment/branch a bit late but pretty much made the same changes (like fixing the build).

zjffdu · 2017-11-16T10:52:43Z

@matyix I try to run spark pi on spark-k8s, but hit the following error. Do I miss anything ? Thanks

========================================
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Local JARs were provided, however no resource staging server URI was found.
	at scala.Predef$.require(Predef.scala:224)
	at org.apache.spark.deploy.k8s.OptionRequirements$$anonfun$requireSecondIfFirstIsDefined$1.apply(OptionRequirements.scala:33)
	at org.apache.spark.deploy.k8s.OptionRequirements$$anonfun$requireSecondIfFirstIsDefined$1.apply(OptionRequirements.scala:32)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.deploy.k8s.OptionRequirements$.requireSecondIfFirstIsDefined(OptionRequirements.scala:32)
	at org.apache.spark.deploy.k8s.submit.submitsteps.initcontainer.InitContainerConfigurationStepsOrchestrator.<init>(InitContainerConfigurationStepsOrchestrator.scala:66)
	at org.apache.spark.deploy.k8s.submit.DriverConfigurationStepsOrchestrator.getAllConfigurationSteps(DriverConfigurationStepsOrchestrator.scala:154)
	at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$5.apply(Client.scala:186)
	at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$5.apply(Client.scala:184)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2551)
	at org.apache.spark.deploy.k8s.submit.Client$.run(Client.scala:184)
	at org.apache.spark.deploy.k8s.submit.Client$.main(Client.scala:204)
	at org.apache.spark.deploy.k8s.submit.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:786)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

matyix · 2017-11-16T12:47:56Z

@zjffdu Spark submit needs a resource staging server (RSS) to be specified as a parameter, you should start RSS, get the address of RSS - as described in the documentation which is part of the PR - https://github.com/banzaicloud/zeppelin/blob/spark-interpreter-k8s/docs/interpreter/spark-interpreter-k8s.md - and specify these:

--deploy-mode cluster --kubernetes-namespace default 
--conf spark.kubernetes.resourceStagingServer.uri=http://{RESOURCE_STAGING_SERVER_ADDRESS}:10000 
--conf spark.kubernetes.resourceStagingServer.internal.uri=http://{RESOURCE_STAGING_SERVER_ADDRESS}:10000

zjffdu · 2017-11-16T13:03:25Z

Thanks @matyix Looks like the doc here need to be updated. https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html

matyix · 2017-11-16T13:15:07Z

@zjffdu Not sure, never followed that one but I'll check with the folks over there as I have a few PR's on that repo as well. I've added Spark on K8S docs in the PR just to be on the safe side that people can start quickly with the confidence of using the Zeppelin docs. We can always modify that and link to the Spark on K8S once that is fixed.

echarles · 2017-11-19T11:16:42Z

Tested latest commit on AWS K8S cluster and works great as well in client as cluster modes - Kudos @matyix

matyix · 2017-11-19T13:06:25Z

Thanks @echarles. Let me know if anything else is needed to get this merged.

echarles · 2017-11-21T09:47:51Z

@matyix If my tests are correct, for now, we can not set spark.app.name and spark.kubernetes.driver.pod.name (if you set them, the driver will not be found by zeppelin).

Upon that is not user-friendly, the side effect is that pod driver and executors names start with null-....

IMHO it is a matter of passing the correct parameters to spark-submit depending on the presence or not of those properties.

matyix · 2017-11-21T15:04:54Z

Hello @echarles

You can set both spark.app.name and spark.kubernetes.driver.pod.name from interpreter settings and they will be set for spark-submit however you’re right regarding checking the driver pod name, I’m expecting a specific prefix. This can be changed to check whether there’s a specific pod name set for the interpreter in settings. Let me know how is preferred and I can go ahead with the change.

echarles · 2017-11-21T15:11:16Z

Hi @matyix

I am playing in cluster mode setting/removing via the interpreter page the spark.app.name and spark.kubernetes.driver.pod.name and so far the only to make it work (= remote executor finding the spark driver) is really having the absence of those props in the interpreter page (maybe I am fooled with spark.app.name).

Yes, checking if a specific spark.kubernetes.driver.pod.name prop and using it will be an expected enhancement, and may even solve the spark.app.name potential issue.

It also seems to me that Zeppelin adds spark.app.name and master props if they are not present.

I guess we should for now rely on that behavior and ensure in this PR that enough intuitive and documented material is available for the end-user.

echarles · 2017-11-21T16:20:20Z

@matyix I made a few more tests, and now spark.app.name and spark.kubernetes.driver.pod.name can be set without problem... (to be further confirmed).

matyix · 2017-11-22T15:01:04Z

@echarles Added some nice to have features like a separate log4j config for k8s (same as for yarn), updated the doc and now I’m only checking the processId label on driver pod so users can freely change the driver pod name.

echarles · 2017-11-23T05:41:32Z

Thx @matyix. I will test and give feedback.

Did you adding external dependencies (via the interpreter page). It works on my setup in client mode but fails (ClassNotFound) in cluster mode. I can debug but your tests on this can help.

matyix · 2017-11-23T08:16:44Z

@echarles Currently there are two ways to add external dependencies: add a new paragraph to the notebook using spark.dep interpreter and z.load(). This works because it downloads the dependencies inside the driver. There is an issue with this on Spark 2.2 and Scala 2.11.8 similar to https://issues.apache.org/jira/browse/ZEPPELIN-2475. Adding artifacts to Interpreter setting doesn’t work in case of Spark cluster mode, since dependencies are downloaded locally. Maybe we can think about something like: in case of deployMode=cluster add artifact to --packages automatically and don’t download it locally.

echarles · 2017-11-23T08:50:58Z

@matyix There is a long history in Zeppelin on spark.dep vs external dependency in interpreter settings. I am fan of the later (interpreter settings), so if the --packages flag can make it work, this would be wonderful.

I don't see in Spark doc that --package add the jars on the executor classpath.

The spark.jars property (Comma-separated list of local jars to include on the driver and executor classpaths) may be an alternative.

matyix · 2017-11-23T16:33:14Z

@echarles The packages option for spark submit is described here and it seems to work. Using this option seems to be better alternative vs. then spark.jars because makes more sense to download jars right inside driver and executor pods, where they will be used. I think we may address this in a separate PR because this could be useful for yarn cluster mode as well not just for k8s.

echarles · 2017-11-23T16:58:17Z

@matyix sure, we can address the dep in separate PR, especially if it is beneficial for all deploy modes. The downside of it is that we will have everybody happy with any change on deps management (if you search the mailing list, you will see a lot of questions and discussions around this).

An additional parameter is the definition of "local" dependencies that don't come from any maven repo on the Internet. I regularly have to define local jars on the local disk file system (or even resources files that are not jar) and with the current yarn-client behavior of Zeppelin (probably not for yarn-cluster, I didn't test), those jars/resources are available on the classpath (of the driver and executor).

If we could inject this now while this branch is being discussed, that would be really a good thing.

matyix · 2017-11-27T13:49:21Z

@echarles jars from Zeppelin local-repo are set for spark-submit in sparks.jars parameter

echarles · 2017-11-29T21:46:49Z

@matyix I've given a try on you last commit and can not get the additional deps (in settings page) working.

I don't see the spark.jars property in the generate command (by interpreter.sh):

/opt/spark/bin/spark-submit --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer --driver-class-path ":/opt/zeppelin/interpreter/spark/*:/opt/zeppelin/lib/interpreter/*::/opt/zeppelin/interpreter/spark/zeppelin-spark_2.11-0.8.0-SNAPSHOT.jar:/etc/hdfs-k8s/conf" --driver-java-options " -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin/conf/log4j.properties -Dzeppelin.log.file=/opt/zeppelin/logs/zeppelin-interpreter---zeppelin-k8s-hdfs-locality-zeppelin-7cd554b49d-dpq2k.log" --master k8s://https://kubernetes:443 --conf spark.cores.max='1' --conf spark.shuffle.service.enabled='false' --conf spark.yarn.dist.archives=/opt/spark/R/lib/sparkr.zip --conf spark.executor.instances='3' --conf spark.sql.catalogImplementation='in-memory' --conf spark.app.name='zeppelin-k8s-spark' --conf spark.executor.memory='1g' --conf spark.master='k8s://https://kubernetes:443' --conf spark.kubernetes.namespace='default' --conf spark.kubernetes.executor.docker.image='datalayer/spark-k8s-executor:2.2.0-0.5.0' --conf spark.kubernetes.driver.docker.image='datalayer/spark-k8s-driver:2.2.0-0.5.0' --conf spark.kubernetes.initcontainer.docker.image='datalayer/spark-k8s-init:2.2.0-0.5.0' --conf spark.kubernetes.resourceStagingServer.uri='http://10.108.197.6:10000' --conf

matyix · 2017-11-30T14:14:36Z

From the command line it seems that the default SparkInterpreter is launched not the K8 specific one. Could you please check if spark.submit.deployMode is set to cluster since thats the other condition to launch SparkK8IntrepreterLauncher?

zjffdu · 2018-01-02T03:14:59Z

zeppelin-zengine/pom.xml

@@ -625,6 +625,32 @@
        </exclusion>
      </exclusions>
    </dependency>
+
+    <dependency>
+      <groupId>io.fabric8</groupId>


Would this be better ? https://github.com/kubernetes-client/java/
It is officially kubernetes java library

hello @zjffdu - currently I'd say the fabric8 library is more mature (68 individual contributors, hundreds of releases, currently at version 3.17) and it's used by the Spark project as well. If this is an issue I can change it in one day, though - we use the fabric8 lib in-house extensively but can comment on the kubernetes-client one (never used). Let me know, please.

Is kubernetes-client the official client? Sounds to me like we should stick to fabric8 for the reasons @matyix gives.

zjffdu · 2018-01-03T00:16:20Z

docs/interpreter/spark-interpreter-k8s.md

+spec:
+  containers:
+  - name: zeppelin-server
+	image: banzaicloud/zeppelin-server:v0.8.0-k8s-1.0.34


What if user want to change zeppelin configuration and restart it ? Does it require to rebuild the image ? Or attach a volume with zeppelin conf to this image ?

Yes in this example you have to rebuild the image, but you can map the config path, or you can use config maps also.

zjffdu · 2018-01-03T00:19:27Z

docs/interpreter/spark-interpreter-k8s.md

+  - name: zeppelin-server
+	image: banzaicloud/zeppelin-server:v0.8.0-k8s-1.0.34
+	env:
+	- name: SPARK_SUBMIT_OPTIONS


SPARK_SUBMIT_OPTIONS is a global setting which affect all the spark interpreter. It would be better to only change the interpreter setting.

The assumption here is that you will run all spark interpreters on K8S, I used SPARK_SUBMIT_OPTIONS since this is the default way to add custom params, however, you can also set up these properties in interpreter settings as properties prefixed with 'spark' and will be added as config params automatically.

zjffdu · 2018-01-03T00:20:06Z

docs/interpreter/spark-interpreter-k8s.md

+EOF
+``` 
+
+## Edit SPARK_SUBMIT_OPTIONS: 


RESOURCE_STAGING_SERVER_ADDRESS -> RESOURCE_STAGING_SERVER_ADDRESS ?

It means that you have to retrieve RESOURCE_STAGING_SERVER_ADDRESS with kubectl and set it in SPARK_SUBMIT_OPTIONS in the yaml.

zjffdu · 2018-01-03T00:59:59Z

docs/interpreter/spark-interpreter-k8s.md

+spec:
+  containers:
+  - name: zeppelin-server
+	image: banzaicloud/zeppelin-server:v0.8.0-k8s-1.0.34


Is it v2.2.0-k8s-1.0.34 instead ? I don't see v0.8.0-k8s-1.0.34 in https://hub.docker.com/r/banzaicloud/zeppelin-server/tags/

Renamed docker tags with the latest push.

felixcheung

I think this is great to have but I have a few concerns.

as discussed with spark-k8s group during the SIG meeting, a lot of changes are going in for in-cluster client support soon (it is not supported yet) and it is likely better to have Zeppelin that way
we need to be very careful about adding features that depends on source code or binaries from a fork of another ASF project (and not the ASF project itself)
we also need to be very careful to document steps that when user run, would get in Zeppelin binaries (in a docker image) that are not part of the official Apache Zeppelin release. While we may or may not have any licensing issue, it can create confusion for user of Apache Zeppelin

matyix · 2018-01-03T17:53:03Z

@zjffdu @felixcheung added the requested changes/suggestions, updated the PR.

echarles · 2018-01-08T12:12:07Z

Feedback/Question on latest commit:

For cluster mode, I sometimes receive a null pointer at first run, running a second time directly after is fine.
Are you enforcing the executor pod name?
Additional dep for cluster mode works with %spark.dep, but not via the Spark interpreter setting UI (see screnshot)

matyix · 2018-01-08T16:35:37Z

hello @echarles

we're also running into this NPE problem, my colleague @sancyx already commented on this issue: https://issues.apache.org/jira/browse/ZEPPELIN-2475 [Magyari Sandor Szilard added a comment - 05/Dec/17 10:14] and he's going to submit a PR to solve this problem
executor pod name is generated by Spark Driver using the spark.app.name specified in spark-submit as a prefix. To spark.app.name prefix specified in interpreter settings is appended to the groupId of the interpreter group.
are the dependencies added to the spark-submit command as --jars ? If so can you please share the logs?

echarles · 2018-01-08T18:08:28Z

thx @matyix I had taken logs but don't have them anymore... but I can confirm that in both cases (deps via %spark.dep or via the ui) the command generated by the interpreter.sh is the same and does not contain the --jar option. If you have a cluster at hand, you should easily be able to confirm that it does not work via the ui.

matyix · 2018-01-09T06:15:03Z

@echarles usually when adding a dependency on UI that should be downloaded to local-repo/spark folder, then those jars will be set in --jars param. Could you please check your local-repo/spark folder if there are any jars?

naveenkumargp · 2018-05-09T09:21:01Z

@matyix
can you please share any information about following up on the PR so that zeppelin works on K8S.
In the developer mailing list, some modifications and queries have been posted. Would be interested to hear from you.

…luster

sancyx · 2018-05-09T15:04:47Z

Hi @naveenkumargp, there was a refactor around the interpreter packaging which caused the ClassNotFound problems. Previously there was a big jar containing the interpreter class as well, which doesn't exists anymore. We've updated the PR, so that all jar files from local repo are enumerated with --jar option to spark-submit, which probably is a better approach.
With regards to deploy mode: just tried to add spark.submit.deployMode property via UI and it worked for me, we do set this by default to cluster in our custom interpreter.json, as well as the master property to k8s://https://kubernetes:443 however intentionally didn't included in the patch, since this is only optional.
We are using this patch about a half year now, in our deployments and works fine for us, would be glad to contribute to community. Please try this latest version if works for you.

naveenkumargp · 2018-05-10T17:52:55Z

Hi @sancyx
thanks for the update ,so now if we want to use this PR how to use. like should we take clone latest zeppelin source code add your changes manually and build. OR is there any other git uri is there, for this particular commit which includes this #PR. how it will be?

regards
naveen

matyix · 2018-05-10T18:07:25Z

@naveenkumargp Ideally, this should be merged (chances are unlikely if you check the history of the PR, and in a few days this will conflict with the master branch, so will be even more unlikely) otherwise use the fork we maintain and build from there: https://github.com/banzaicloud/zeppelin/tree/spark-interpreter-k8s and use the instructions from https://banzaicloud.com/blog/zeppelin-spark-k8/ blog series.

We will be launching this as a service late May so you can get the platform and have it done all these above for you: https://github.com/banzaicloud/pipeline

naveenkumargp · 2018-05-14T11:17:38Z

@sancyx
we tried by taking the source code of commit id 7f23780 by building it without modifying interpreter.sh, when we submit the spark job we are getting the following error when spark driver is been launched.

Error: Could not find or load main class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
regards
Naveen

sancyx · 2018-05-14T15:11:41Z

@naveenkumargp All jars from ZEPPELIN_HOME/lib/interepter should be added with --jar option. Could you please attach the generated spark-submit command in Zeppelin logs?

naveenkumargp · 2018-05-15T09:53:08Z

@sancyx
we have tried by adding --jars option interpreter.sh still we are getting following when the spark job is being submitted.

java.lang.RuntimeException: Unable to start SparkK8RemoteInterpreterManagedProcess: Spark Driver not found.

and the spark command which is being generated is given below
/opt/spark/bin/spark-submit --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer --driver-class-path ":/opt/zeppelin-0.9.0-SNAPSHOT/interpreter/spark/:/opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/::/opt/zeppelin-0.9.0-SNAPSHOT/interpreter/spark/spark-interpreter-0.9.0-SNAPSHOT.jar:/opt/hadoop/etc/hadoop/" --driver-java-options " -Dfile.encoding=UTF-8 -Dlog4j.configuration=log4j_k8_cluster.properties" --conf spark.executor.instances=2 --jars /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/aether-api-1.12.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/aether-connector-file-1.12.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/aether-connector-wagon-1.12.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/aether-impl-1.12.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/aether-spi-1.12.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/aether-util-1.12.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/bcpkix-jdk15on-1.52.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/bcprov-jdk15on-1.52.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/commons-codec-1.5.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/commons-configuration-1.9.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/commons-exec-1.3.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/commons-httpclient-3.1.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/commons-io-2.4.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/commons-lang-2.5.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/commons-logging-1.1.1.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/commons-pool2-2.3.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/gson-2.2.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/gson-extras-0.2.1.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/httpclient-4.5.1.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/httpcore-4.4.1.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/jline-2.12.1.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/jsoup-1.6.1.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/libthrift-0.9.2.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/log4j-1.2.17.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/maven-aether-provider-3.0.3.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/maven-artifact-3.0.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/maven-model-3.0.3.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/maven-model-builder-3.0.3.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/maven-plugin-api-3.0.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/maven-repository-metadata-3.0.3.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/plexus-classworlds-2.4.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/plexus-component-annotations-1.5.5.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/plexus-interpolation-1.14.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/plexus-utils-2.0.7.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/sisu-guice-3.0.2-no_aop.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/sisu-inject-bean-2.2.2.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/sisu-inject-plexus-2.2.2.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/slf4j-api-1.7.10.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/slf4j-log4j12-1.7.10.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/wagon-http-1.0.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/wagon-http-lightweight-1.0.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/wagon-http-shared-1.0.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/wagon-provider-api-1.0.jar /opt/zeppelin-0.9.0-SNAPSHOT/lib/interpreter/zeppelin-interpreter-0.9.0-SNAPSHOT.jar --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --conf spark.submit.deployMode='cluster' --conf spark.app.name='Zeppelin-spark-shared-process' --conf spark.kubernetes.driver.label.interpreter-processId='spark-shared-process-1526377697510' --conf spark.metrics.namespace='Zeppelin-spark-shared-process' /opt/zeppelin-0.9.0-SNAPSHOT/interpreter/spark/spark-interpreter-0.9.0-SNAPSHOT.jar 30000

naveenkumargp · 2018-05-15T12:10:25Z

@matyix

we forked the your banzai cloud zeppelin code from uri https://github.com/banzaicloud/zeppelin/tree/spark-interpreter-k8s and tried building while building we are getting the following issue.

npm ERR! Linux 3.10.0-327.28.2.el7.x86_64
npm ERR! argv "/usr/bin/node" "/usr/bin/npm" "run" "build:dist" "--https-proxy=http://87.254.212.120:8080" "--proxy=http://87.254.212.120:8080" "-verbose"
npm ERR! node v6.9.4
npm ERR! npm v3.10.10
npm ERR! code ELIFECYCLE
npm ERR! [email protected] build:dist: npm-run-all prebuild && grunt pre-webpack-dist && webpack && grunt post-webpack-dist
npm ERR! Exit status 3
npm ERR!
npm ERR! Failed at the [email protected] build:dist script 'npm-run-all prebuild && grunt pre-webpack-dist && webpack && grunt post-webpack-dist'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the zeppelin-web package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! npm-run-all prebuild && grunt pre-webpack-dist && webpack && grunt post-webpack-dist
npm ERR! You can get information on how to open an issue for this project with:
npm ERR! npm bugs zeppelin-web
npm ERR! Or if that isn't available, you can get their info via:
npm ERR! npm owner ls zeppelin-web
npm ERR! There is likely additional logging output above.
npm verb exit [ 1, true ]
npm verb stack Error: Unknown system error -122: Unknown system error -122, open 'npm-debug.log.1139106934'
npm verb stack at Error (native)
npm verb cwd /u/npanchap/banzaizeppelin/zeppelin/zeppelin-web
npm ERR! Linux 3.10.0-327.28.2.el7.x86_64
npm ERR! argv "/usr/bin/node" "/usr/bin/npm" "run" "build:dist" "--https-proxy=http://87.254.212.120:8080" "--proxy=http://87.254.212.120:8080" "-verbose"
npm ERR! node v6.9.4
npm ERR! npm v3.10.10
npm ERR! path npm-debug.log.1139106934
npm ERR! code Unknown system error -122
npm ERR! errno -122
npm ERR! syscall open

npm ERR! Unknown system error -122: Unknown system error -122, open 'npm-debug.log.1139106934'
npm ERR!
npm ERR! If you need help, you may report this error at:
npm ERR! https://github.com/npm/npm/issues
npm verb exit [ -122, true ]

npm ERR! Please include the following file with any support request:
npm ERR! /u/npanchap/banzaizeppelin/zeppelin/zeppelin-web/npm-debug.log

matyix · 2018-05-15T16:30:11Z

@naveenkumargp Not sure this is the best place discussing your problem ... anyways:

The message Unable to start SparkK8RemoteInterpreterManagedProcess: Spark Driver not found. means that Zeppelin was not able to find a running Spark Driver pod. I don't see Resource Staging Server and image params in your spark-submit command, these should be set in SPARK_SUBMIT_OPTIONS - checkout the description here:.

You can build an image from the Banzai Cloud branch, check out our .circleci/config.yaml or even better and easier, you may use/check Banzai Cloud's zeppelin-spark Helm chart to set up a working example on K8S.

I am not sure I can give more options than this - I might suggest reading/understanding how Spark/Zeppelin on Kubernetes works from our blog or just use the Pipeline platform ...

naveenkumargp · 2018-05-16T10:59:31Z

@matyix
Thanks for the information. The resource staging server and spark (driver/executor) images have been set in the spark-defaults.conf inside the zeppelin docker image. We believe that the issue as per our understanding is not because of that.
The class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer is not present in the spark-interpreter-0.9.0-SNAPSHOT.jar.
The class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer is present in zeppelin-interpreter-0.9.0-SNAPSHOT.jar.
Using this, is working for us. May be we are missing something.

The documentation and information in the blogs are very useful and informative.

If this is not the right forum for this discussion, can you please share where this can be discussed further.

naveenkumargp · 2018-06-01T11:51:22Z

@matyix @sancyx
when we had taken the snapshot it was missing the code related adding of jars from zConf.getZeppelinHome() + "/lib/interpreter".
with the latest code without any changes we are able to run zeppelin with spark on k8s .
thanks for all your help.

matyix · 2018-06-01T12:24:39Z

@naveenkumargp you are welcome if you find bugs or have feature requests related to our codebase feel free to open a GH issue at our fork.

Also, we are constantly maintaining/rebasing/patching the branch on the fork, so make sure you rebase once in a while.

matyix · 2018-06-01T13:48:22Z

@naveenkumargp one more thing. There is another alternative we are experimenting - we are adding Kubernetes integration to Livy and you would be able to use the Livy interpreter (no Zeppelin code modification required) and spin up Spark on K8s (and you can use Jupyter as well). Let me know if you are interested - drop me a mail.

naveenkumargp · 2018-06-04T12:41:38Z

@matyix

yes we are very much interested on running Livy interpreter on K8s.

please let us know how we can proceed further.

regards
naveen

nrchakradhar · 2018-06-30T10:50:13Z

@matyix
Will this change be followed up for upstreaming or will it now be only with Livy.
With Livy, also it would be good to have this change also as a feature in Zeppelin.

shatestest · 2019-01-22T18:17:56Z

@matyix Can you please help me with this
https://stackoverflow.com/questions/54312233/zeppeling-throwing-nullpointerexception-while-configuring

felixcheung reviewed Nov 1, 2017

View reviewed changes

zjffdu reviewed Jan 2, 2018

View reviewed changes

zjffdu reviewed Jan 3, 2018

View reviewed changes

felixcheung reviewed Jan 3, 2018

View reviewed changes

asfgit force-pushed the master branch from b11d355 to 3712ce6 Compare May 9, 2018 05:45

ZEPPELIN-3021. Add support to run Spark interpreter on a Kubernetes c…

7f23780

…luster

matyix closed this Jun 29, 2018

sancyx deleted the spark-interpreter-k8s branch July 17, 2018 13:19


		COPY zeppelin-distribution/target/zeppelin-0.8.0-SNAPSHOT/zeppelin-0.8.0-SNAPSHOT /opt/zeppelin

		ADD https://storage.googleapis.com/kubernetes-release/release/v1.7.4/bin/linux/amd64/kubectl /usr/local/bin

Add support to run Spark interpreter on a Kubernetes cluster #2637

Add support to run Spark interpreter on a Kubernetes cluster #2637

Conversation

matyix commented Oct 31, 2017 • edited Loading

What is this PR for?

What type of PR is it?

What is the Jira issue?

How should this be tested?

Questions:

zjffdu commented Nov 1, 2017

felixcheung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matyix commented Nov 1, 2017 • edited Loading

matyix commented Nov 2, 2017

echarles commented Nov 10, 2017

echarles commented Nov 11, 2017

matyix commented Nov 12, 2017 • edited Loading

echarles commented Nov 15, 2017

echarles commented Nov 15, 2017

matyix commented Nov 15, 2017

zjffdu commented Nov 16, 2017

matyix commented Nov 16, 2017

zjffdu commented Nov 16, 2017

matyix commented Nov 16, 2017 • edited Loading

echarles commented Nov 19, 2017

matyix commented Nov 19, 2017

echarles commented Nov 21, 2017

matyix commented Nov 21, 2017 • edited Loading

echarles commented Nov 21, 2017 • edited Loading

echarles commented Nov 21, 2017

matyix commented Nov 22, 2017

echarles commented Nov 23, 2017

matyix commented Nov 23, 2017

echarles commented Nov 23, 2017

matyix commented Nov 23, 2017 • edited Loading

echarles commented Nov 23, 2017 • edited Loading

matyix commented Nov 27, 2017

echarles commented Nov 29, 2017

matyix commented Nov 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixcheung left a comment • edited Loading

Choose a reason for hiding this comment

matyix commented Jan 3, 2018 • edited Loading

echarles commented Jan 8, 2018

matyix commented Jan 8, 2018 • edited Loading

echarles commented Jan 8, 2018

matyix commented Jan 9, 2018

naveenkumargp commented May 9, 2018

sancyx commented May 9, 2018

naveenkumargp commented May 10, 2018

matyix commented May 10, 2018 • edited Loading

naveenkumargp commented May 14, 2018

sancyx commented May 14, 2018

naveenkumargp commented May 15, 2018

naveenkumargp commented May 15, 2018

matyix commented May 15, 2018

naveenkumargp commented May 16, 2018

naveenkumargp commented Jun 1, 2018

matyix commented Jun 1, 2018 • edited Loading

matyix commented Jun 1, 2018

naveenkumargp commented Jun 4, 2018

matyix commented Oct 31, 2017 •

edited

Loading

matyix commented Nov 1, 2017 •

edited

Loading

matyix commented Nov 12, 2017 •

edited

Loading

matyix commented Nov 16, 2017 •

edited

Loading

matyix commented Nov 21, 2017 •

edited

Loading

echarles commented Nov 21, 2017 •

edited

Loading

matyix commented Nov 23, 2017 •

edited

Loading

echarles commented Nov 23, 2017 •

edited

Loading

felixcheung left a comment •

edited

Loading

matyix commented Jan 3, 2018 •

edited

Loading

matyix commented Jan 8, 2018 •

edited

Loading

matyix commented May 10, 2018 •

edited

Loading

matyix commented Jun 1, 2018 •

edited

Loading