-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to run Spark interpreter on a Kubernetes cluster #2637
Conversation
Thanks @matyix for this contribution, could you add some doc to illustrate how to use this feature in zeppelin ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of comments, could you elaborate on the design choices you have considered?
in this implementation it looks like it's running Zeppelin outside of k8s and connecting to the driver pod running on k8s. I wonder if Zeppelin use cases will be more closely match with the in-cluster client feature in discussion on apache-spark-on-k8s PR # 456 (not linking directly here) - because, in general, it seems to me it makes more sense to run everything on top of k8s.
bin/interpreter.sh
Outdated
@@ -205,7 +214,9 @@ if [[ ! -z "$ZEPPELIN_IMPERSONATE_USER" ]]; then | |||
fi | |||
|
|||
if [[ -n "${SPARK_SUBMIT}" ]]; then | |||
if [[ -n "$ZEPPELIN_IMPERSONATE_USER" ]] && [[ "$ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER" != "false" ]]; then | |||
if [[ -n "${RUN_SPARK_ON_K8}" ]]; then | |||
INTERPRETER_RUN_COMMAND+=' '` echo ${SPARK_SUBMIT} --class ${ZEPPELIN_SERVER} ${SPARK_SUBMIT_OPTIONS} --conf spark.app.name=zri-${INTERPRETER_GROUP_NAME} --conf spark.kubernetes.driver.label.interpreter-processId=${INTERPRETER_PROCESS_ID} --conf spark.metrics.namespace=zeppelin_${INTERPRETER_GROUP_NAME} ${SPARK_APP_JAR} ${PORT}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to match other mode to include --driver-class-path \"${ZEPPELIN_INTP_CLASSPATH_OVERRIDES}:${ZEPPELIN_INTP_CLASSPATH}\" --driver-java-options \"${JAVA_INTP_OPTS}\" ${SPARK_SUBMIT_OPTIONS} ${ZEPPELIN_SPARK_CONF}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that these doesn't makes sense for the apache-spark-on-k8s version of spark-submit
, beacuse it is launching the Spark driver in a separate pod, so any dependency should be passed via ResourceStagingServer or built into Spark Docker images.
Currently you can't pass additional java options to Spark Driver, the supported options (https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html) can be passed in SPARK_SUBMIT_OPTIONS.
zeppelin-dockerfiles/Dockerfile
Outdated
# limitations under the License. | ||
# | ||
|
||
FROM shipyardlabs/spark-base:zeppelin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure what's ASF policy on basing Docker image on a 3rd party one?
zeppelin-dockerfiles/Dockerfile
Outdated
# command should be invoked from the top level directory of the Spark distribution. E.g.: | ||
# docker build -t spark-driver:latest -f dockerfiles/driver/Dockerfile . | ||
|
||
COPY zeppelin-distribution/target/zeppelin-0.8.0-SNAPSHOT/zeppelin-0.8.0-SNAPSHOT /opt/zeppelin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should parameterize this to not hardcode a release version number
zeppelin-dockerfiles/Dockerfile
Outdated
|
||
COPY zeppelin-distribution/target/zeppelin-0.8.0-SNAPSHOT/zeppelin-0.8.0-SNAPSHOT /opt/zeppelin | ||
|
||
ADD https://storage.googleapis.com/kubernetes-release/release/v1.7.4/bin/linux/amd64/kubectl /usr/local/bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a particular reason for v1.7, instead of 1.8?
https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-binary-via-curl
&& properties.getProperty("master").startsWith("k8s")) { | ||
this.launcher = new SparkK8InterpreterLauncher(this.conf); | ||
} else { | ||
this.launcher = new SparkInterpreterLauncher(this.conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zjffdu - I think we need a fast way to abstract out the different launchers, instead of bringing everything in here (plus artifacts like kubernetes-client). Do you have any thought on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually I was thinking about this as a long term solution - applicable not just to this launcher. If there was something like a pluggable/configurable launcher, we don't have to include the k8s client and SparkK8Launcher in core zeppelin code, just implement as a plugabble launcher added to Zeppelin classpath and configure Zeppelin to use it.
if (group.equals("spark")) { | ||
this.launcher = new SparkInterpreterLauncher(this.conf); | ||
if (properties.getProperty("master") != null | ||
&& properties.getProperty("master").startsWith("k8s")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proper match should be k8s://
https://github.com/apache-spark-on-k8s/spark/pull/498/files#diff-817de38fee3505bdca2e40ce393857f6R123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be changed to properties.getProperty("master").startsWith("k8s://")
Hello @zjffdu @felixcheung - I have added docs about how to use/build this PR as well. Also the Dockerfile has been removed, the documentation contains details of using a third party (mine) or build your own Dockerfile - it was added in the repo as a convenience, but due to licensing concerns I did remove it. There was no particular reason why it was 1.7 (at the time I started to work on this 1.8 was not released yet - and I usually waiting for the first patch version until I switch). |
@zjffdu @felixcheung I have updated the original PR description with design considerations and a typical K8S cluster and Zeppelin flow showing how I am using this feature/PR currently on the https://github.com/apache-spark-on-k8s/spark Spark version. |
@matyix I have a local spark-k8s setup and have this branch (without success so far, so debugging SparkK8RemoteInterpreterManagedProcess to tackle down the issue). A few questions:
Would the classical |
Just tried the It works fine out-of-the-box (with the ad-hoc spark interpreter parameters). @matyix Do you see any reason to further work on this PR? Maybe you want to address additional goals with this? |
@echarles @zjffdu @felixcheung It is absolutely makes sense to keep this PR and make further work with it. Just to re-empahsize, the goal is to enable Zeppelin to submit notebooks to a Kubernetes cluster invoking Please find below a couple of advantages the cluster mode has comparing to the client mode:
Overall this is a way better and cleaner approach which fits the K8S ecosystem and at the same time has no side-effect for those not willing to use K8S. I will update the PR regardless to fix the merge conflicts and add some minor changes/improvements - I am using this PR extensively on a few large K8S clusters and it works/fits our needs on K8S and complies with the our K8S cluster standards/best practices. |
@matyix I have tested your last commits and was able to make it work in my env (with both zeppelin Your implement a new (specific for spark-k8s) launch and remote executor. In another local branch, I have tried to stick as much as possible to the current zeppelin paradigm (thrift servers both sides of the interpreters processes with CallbackInfo) and 2 parameters (host, port) for interpreter.sh - I still have issue with the callback, so I finally think the approach you propose is good and does the job. My feedbacks:
WDYT? Do you prefer me to submit a PR on your PR and will you make another push? |
PS1: I have pushed my fixes in zeppelin-k8s/tree/spark-interpreter-k8s-fixes branch (which has merge conflict with master due to latest commit 3b1a03f that touch the launchers and remote executors) PS2: I have opened on PR to document this on the spark-k8s docs repo apache-spark-on-k8s/userdocs/pull/21 |
Hello @echarles, thanks for the feedback, followed up on that and made the suggested changes, please see below:
The branch can be merged as of now - unluckily the master is moving fast. I've seen your comment/branch a bit late but pretty much made the same changes (like fixing the build). |
@matyix I try to run spark pi on spark-k8s, but hit the following error. Do I miss anything ? Thanks
|
@zjffdu Spark submit needs a resource staging server (RSS) to be specified as a parameter, you should start RSS, get the address of RSS - as described in the documentation which is part of the PR - https://github.com/banzaicloud/zeppelin/blob/spark-interpreter-k8s/docs/interpreter/spark-interpreter-k8s.md - and specify these:
|
Thanks @matyix Looks like the doc here need to be updated. https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html |
@zjffdu Not sure, never followed that one but I'll check with the folks over there as I have a few PR's on that repo as well. I've added Spark on K8S docs in the PR just to be on the safe side that people can start quickly with the confidence of using the Zeppelin docs. We can always modify that and link to the Spark on K8S once that is fixed. |
Tested latest commit on AWS K8S cluster and works great as well in |
Thanks @echarles. Let me know if anything else is needed to get this merged. |
@matyix If my tests are correct, for now, we can not set Upon that is not user-friendly, the side effect is that pod driver and executors names start with IMHO it is a matter of passing the correct parameters to spark-submit depending on the presence or not of those properties. |
Hello @echarles You can set both |
Hi @matyix I am playing in cluster mode setting/removing via the interpreter page the Yes, checking if a specific It also seems to me that Zeppelin adds I guess we should for now rely on that behavior and ensure in this PR that enough intuitive and documented material is available for the end-user. |
@matyix I made a few more tests, and now |
@echarles Added some nice to have features like a separate log4j config for k8s (same as for yarn), updated the doc and now I’m only checking the |
Thx @matyix. I will test and give feedback. Did you adding external dependencies (via the interpreter page). It works on my setup in |
@echarles Currently there are two ways to add external dependencies: add a new paragraph to the notebook using |
@matyix There is a long history in Zeppelin on I don't see in Spark doc that --package add the jars on the executor classpath. The |
@echarles The packages option for spark submit is described here and it seems to work. Using this option seems to be better alternative vs. then spark.jars because makes more sense to download jars right inside driver and executor pods, where they will be used. I think we may address this in a separate PR because this could be useful for |
@matyix sure, we can address the dep in separate PR, especially if it is beneficial for all deploy modes. The downside of it is that we will have everybody happy with any change on deps management (if you search the mailing list, you will see a lot of questions and discussions around this). An additional parameter is the definition of "local" dependencies that don't come from any maven repo on the Internet. I regularly have to define local jars on the local disk file system (or even resources files that are not jar) and with the current If we could inject this now while this branch is being discussed, that would be really a good thing. |
@echarles jars from Zeppelin local-repo are set for spark-submit in |
@matyix I've given a try on you last commit and can not get the additional deps (in settings page) working. I don't see the
|
From the command line it seems that the default |
zeppelin-zengine/pom.xml
Outdated
@@ -625,6 +625,32 @@ | |||
</exclusion> | |||
</exclusions> | |||
</dependency> | |||
|
|||
<dependency> | |||
<groupId>io.fabric8</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be better ? https://github.com/kubernetes-client/java/
It is officially kubernetes java library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hello @zjffdu - currently I'd say the fabric8 library is more mature (68 individual contributors, hundreds of releases, currently at version 3.17) and it's used by the Spark project as well. If this is an issue I can change it in one day, though - we use the fabric8 lib in-house extensively but can comment on the kubernetes-client
one (never used). Let me know, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is kubernetes-client
the official client? Sounds to me like we should stick to fabric8
for the reasons @matyix gives.
spec: | ||
containers: | ||
- name: zeppelin-server | ||
image: banzaicloud/zeppelin-server:v0.8.0-k8s-1.0.34 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if user want to change zeppelin configuration and restart it ? Does it require to rebuild the image ? Or attach a volume with zeppelin conf to this image ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes in this example you have to rebuild the image, but you can map the config path, or you can use config maps also.
- name: zeppelin-server | ||
image: banzaicloud/zeppelin-server:v0.8.0-k8s-1.0.34 | ||
env: | ||
- name: SPARK_SUBMIT_OPTIONS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPARK_SUBMIT_OPTIONS
is a global setting which affect all the spark interpreter. It would be better to only change the interpreter setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption here is that you will run all spark interpreters on K8S, I used SPARK_SUBMIT_OPTIONS since this is the default way to add custom params, however, you can also set up these properties in interpreter settings as properties prefixed with 'spark' and will be added as config params automatically.
EOF | ||
``` | ||
|
||
## Edit SPARK_SUBMIT_OPTIONS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RESOURCE_STAGING_SERVER_ADDRESS
-> RESOURCE_STAGING_SERVER_ADDRESS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that you have to retrieve RESOURCE_STAGING_SERVER_ADDRESS with kubectl and set it in SPARK_SUBMIT_OPTIONS in the yaml.
spec: | ||
containers: | ||
- name: zeppelin-server | ||
image: banzaicloud/zeppelin-server:v0.8.0-k8s-1.0.34 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it v2.2.0-k8s-1.0.34
instead ? I don't see v0.8.0-k8s-1.0.34
in https://hub.docker.com/r/banzaicloud/zeppelin-server/tags/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed docker tags with the latest push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is great to have but I have a few concerns.
- as discussed with spark-k8s group during the SIG meeting, a lot of changes are going in for in-cluster client support soon (it is not supported yet) and it is likely better to have Zeppelin that way
- we need to be very careful about adding features that depends on source code or binaries from a fork of another ASF project (and not the ASF project itself)
- we also need to be very careful to document steps that when user run, would get in Zeppelin binaries (in a docker image) that are not part of the official Apache Zeppelin release. While we may or may not have any licensing issue, it can create confusion for user of Apache Zeppelin
@zjffdu @felixcheung added the requested changes/suggestions, updated the PR. |
Feedback/Question on latest commit: |
hello @echarles
|
thx @matyix I had taken logs but don't have them anymore... but I can confirm that in both cases (deps via %spark.dep or via the ui) the command generated by the interpreter.sh is the same and does not contain the |
@echarles usually when adding a dependency on UI that should be downloaded to local-repo/spark folder, then those jars will be set in --jars param. Could you please check your local-repo/spark folder if there are any jars? |
@matyix |
Hi @naveenkumargp, there was a refactor around the interpreter packaging which caused the ClassNotFound problems. Previously there was a big jar containing the interpreter class as well, which doesn't exists anymore. We've updated the PR, so that all jar files from local repo are enumerated with --jar option to spark-submit, which probably is a better approach. |
Hi @sancyx regards |
@naveenkumargp Ideally, this should be merged (chances are unlikely if you check the history of the PR, and in a few days this will conflict with the master branch, so will be even more unlikely) otherwise use the fork we maintain and build from there: https://github.com/banzaicloud/zeppelin/tree/spark-interpreter-k8s and use the instructions from https://banzaicloud.com/blog/zeppelin-spark-k8/ blog series. We will be launching this as a service late May so you can get the platform and have it done all these above for you: https://github.com/banzaicloud/pipeline |
@sancyx Error: Could not find or load main class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer |
@naveenkumargp All jars from ZEPPELIN_HOME/lib/interepter should be added with --jar option. Could you please attach the generated spark-submit command in Zeppelin logs? |
@sancyx java.lang.RuntimeException: Unable to start SparkK8RemoteInterpreterManagedProcess: Spark Driver not found. and the spark command which is being generated is given below |
we forked the your banzai cloud zeppelin code from uri https://github.com/banzaicloud/zeppelin/tree/spark-interpreter-k8s and tried building while building we are getting the following issue. npm ERR! Linux 3.10.0-327.28.2.el7.x86_64 npm ERR! Unknown system error -122: Unknown system error -122, open 'npm-debug.log.1139106934' npm ERR! Please include the following file with any support request: |
@naveenkumargp Not sure this is the best place discussing your problem ... anyways: The message You can build an image from the Banzai Cloud branch, check out our I am not sure I can give more options than this - I might suggest reading/understanding how Spark/Zeppelin on Kubernetes works from our blog or just use the Pipeline platform ... |
@matyix The documentation and information in the blogs are very useful and informative. If this is not the right forum for this discussion, can you please share where this can be discussed further. |
@naveenkumargp you are welcome if you find bugs or have feature requests related to our codebase feel free to open a GH issue at our fork. Also, we are constantly maintaining/rebasing/patching the branch on the fork, so make sure you rebase once in a while. |
@naveenkumargp one more thing. There is another alternative we are experimenting - we are adding Kubernetes integration to Livy and you would be able to use the Livy interpreter (no Zeppelin code modification required) and spin up Spark on K8s (and you can use Jupyter as well). Let me know if you are interested - drop me a mail. |
yes we are very much interested on running Livy interpreter on K8s. please let us know how we can proceed further. regards |
@matyix |
@matyix Can you please help me with this |
What is this PR for?
The goal of this PR is to be able to execute Spark notebooks on Kubernetes in cluster mode, so that the Spark Driver runs inside Kubernetes cluster - based on https://github.com/apache-spark-on-k8s/spark. Zeppelin uses
spark-submit
to start RemoteInterpreterServer which is able to execute notebooks on Spark. Kubernetes specificspark-submit
parameters like driver, executor, init container, shuffle images should be set in SPARK_SUBMIT_OPTIONS environment variable. In case the Spark interpreter is configured with a K8 Spark specific master url (k8s://https....) RemoteInterpreterServer is launched inside a Spark driver pod on Kubernetes, thus Zeppelin server it has to be able to connect to the remote server. In a Kubernetes cluster the best solution for this is creating a K8S service for RemoteInterpreterServer. This is the reason for having the SparkK8RemoteInterpreterManagerProcess - extending functionality of RemoteInterpreterManagerProcess - which creates the Kubernetes service, mapping the port of RemoteInterpreterServer in Driver pod and connects to this service once Spark Driver pod is in Running state.Design considerations: As described in
spark-interpreter-k8s.md
, the Zeppelin Server is running inside the Kubenetes cluster - thus we can choose where to run the Zeppelin server - the benefit of running the server inside K8S is that we don't have to deal with authentication. However is not enough to start only the Zeppelin Server inside the Kubernetes cluster as by default Zeppelin will startspark-submit
in the same pod and will run every Spark job locally. The scope of this PR is run to runspark-submit
(apache-spark-on-k8s version) properly configured with Docker images etc. so that the Spark driver will be started in a separate pod in the cluster, also staring separate pods for Spark executors thus we can benefit from dynamic scaling of executors inside the Kubernetes cluster (while all the scheduling, pod allocation, resource management is done by the Kubernetes scheduler).Please see below how is this running/used:
The cluster:
The flow:
What type of PR is it?
Feature
What is the Jira issue?
How should this be tested?
Unit and functional tests - running notebooks on Spark on K8S.
Questions: