-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Scheduler Backend #19468
Changes from 1 commit
f6fdd6a
75e31a9
cf82b21
488c535
82b79a7
c052212
c565c9f
2fb596d
992acbe
b0a5839
a4f9797
2b5dcac
018f4d8
4b32134
6cf4ed7
1f271be
71a971f
0ab9ca7
7f14b71
7afce3f
b75b413
3b587b4
cb12fec
ae396cf
f8e3249
a44c29e
4bed817
c386186
b85cfc4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,8 +46,6 @@ private[spark] trait ExecutorPodFactory { | |
private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf) | ||
extends ExecutorPodFactory { | ||
|
||
import ExecutorPodFactoryImpl._ | ||
|
||
private val executorExtraClasspath = | ||
sparkConf.get(org.apache.spark.internal.config.EXECUTOR_CLASS_PATH) | ||
|
||
|
@@ -76,7 +74,6 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf) | |
|
||
private val executorDockerImage = sparkConf.get(EXECUTOR_DOCKER_IMAGE) | ||
private val dockerImagePullPolicy = sparkConf.get(DOCKER_IMAGE_PULL_POLICY) | ||
private val executorPort = sparkConf.getInt("spark.executor.port", DEFAULT_STATIC_PORT) | ||
private val blockManagerPort = sparkConf | ||
.getInt("spark.blockmanager.port", DEFAULT_BLOCKMANAGER_PORT) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we assuming the port's will always be available (for executor and driver) to bind to ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes - because we're deploying the driver and executors in a containerized setting, we're a lot more free to make looser assumptions about available ports. The exception will be if Spark applications are running sidecar containers along with the main driver/executor containers, but support for that is further out in the future when/if we expect Pod Presets to interact with our code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good, I wanted to make sure I understood it right that we are making the assumption about port being unbound and available for spark. |
||
|
||
|
@@ -139,7 +136,6 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf) | |
} | ||
}.getOrElse(Seq.empty[EnvVar]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How is this getting used ? I see it getting set, but not used anywhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is used in the executor Docker file included in #19717. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, somehow it did not show up in my searches. |
||
val executorEnv = (Seq( | ||
(ENV_EXECUTOR_PORT, executorPort.toString), | ||
(ENV_DRIVER_URL, driverUrl), | ||
// Executor backend expects integral value for executor cores, so round it up to an int. | ||
(ENV_EXECUTOR_CORES, math.ceil(executorCores).toInt.toString), | ||
|
@@ -159,7 +155,6 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf) | |
.build() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we replace use of ip with hostname ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no stable hostname for a Kubernetes Pod unless we put the Pod behind a Service. We can't put executors behind services because Services are considered as heavyweight objects, and we want to constrain the number of them we have to make. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, the above comment should be more precise. DNS is handled by kube-dns in any given Kubernetes context. (kube-dns is an optional component, but in future commits it will become clear that Spark will require kube-dns to be installed on a cluster, and we will document as such). kube-dns creates a DNS entry for services that route to the IPs of pods in the system. But kube-dns does not create a DNS entry for every pod, unless there is a service that maps to it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are various parts of spark which assert that a hostname is used and not IP - that is, spark makes the assumption about executors and drivers being reachable via hostnames. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to clarify, the hostname need not be stable - but needs to be stable only for the duration the container is 'alive' : and is not immediately reused (in case some other executor container comes up with exact same hostname after previous exits for the same spark app). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To add, if the list above is not accurate (needs additions/mods/deletions), that is fine - I was listing potential concerns from top of my head. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mridulm In response to each of your four points:
As a general note, locality is non-trivial in Kubernetes because no two pods will ever share the same IP address, and pods do not share the same IP address as the physical host that is running it. The Kubernetes code needs to be intelligent about knowing which pods are co-located on the same underlying Kubelet. And of course, it's reasonable to believe that the above four considerations are not exhaustive, but we'll address unforeseen factors as they come up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for clarifying @mccheah. One clarification though:
When I referred to IPv6 support, I was not referring to interop between IPv4 and IPv6 (that is a different can of worms !). What I wanted to clarify was that IPv6 is not supported as IP (supported via hostnames though).
This would be critical to get HOST locality level working properly. The performance implications of not getting this right would be non trivial. Looking forward to how this has been addressed ! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was doing another pass over this thread. I think Matt covered this mostly. IPv6 support is being added in kubernetes/enhancements#508 but will be in alpha for now. We should target testing with that once that goes beta (probably Kubernetes 1.10 timeframe). As for routing, that's entirely handled by k8s network plugins and we shouldn't have to do anything special to take advantage of it. Conceptually, the notion of needing a network identity like DNS is separated from containers/pods themselves by design. We could if necessary, create headless services if there's a strong need for a DNS name for each executor, but that would be a heavyweight approach that I would prefer we didn't unless we absolutely had to. Lastly, +1 on the locality comment. I do think that's very important and there's ways we can use the k8s control plane to figure out a global view - different executors and their corresponding physical (or virtual nodes). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good, thanks for the details. |
||
) ++ executorExtraJavaOptionsEnv ++ executorExtraClasspathEnv.toSeq | ||
val requiredPorts = Seq( | ||
(EXECUTOR_PORT_NAME, executorPort), | ||
(BLOCK_MANAGER_PORT_NAME, blockManagerPort)) | ||
.map { case (name, port) => | ||
new ContainerPortBuilder() | ||
|
@@ -220,7 +215,3 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf) | |
.build() | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How is logging handled ? Aggregation of stdout/stderr/other custom log files configured via log4j. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Logs are written to stdout/stderr so they are handled by the Kubernetes logging system. See https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a limitation of what can be supported in current integration ? Or a general restriction of kubernetes itself ? This allows us to have multiple loggers in log4j based on user customization. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's somewhat a limitation of the current integration. Allowing users to use customized log4j configuration needs the ability to inject user-specified configuration files like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for clarifying, that sounds fine - do we have a Ofcourse, this does not need to be part of this PR ! Just something to add in a followup PR before 2.3 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, https://github.com/apache-spark-on-k8s/userdocs/blob/master/src/jekyll/running-on-kubernetes.md. We will add it to |
||
} | ||
|
||
private object ExecutorPodFactoryImpl { | ||
private val DEFAULT_STATIC_PORT = 10000 | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -224,6 +224,7 @@ private[spark] class KubernetesClusterSchedulerBackend( | |
override def stop(): Unit = { | ||
// stop allocation of new resources and caches. | ||
allocatorExecutor.shutdown() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added |
||
allocatorExecutor.awaitTermination(30, TimeUnit.SECONDS) | ||
|
||
// send stop message to executors so they shut down cleanly | ||
super.stop() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.8 for KUBERNETES mode
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, totally mis-interpreted it. Fixed.