Skip to content

Latest commit

 

History

History
577 lines (437 loc) · 29.1 KB

SUMMARY.adoc

File metadata and controls

577 lines (437 loc) · 29.1 KB

Summary

Spark Core / Web UI

  1. Web UI — Spark Application’s Web Console

  2. JobsTab

  3. StagesTab — Stages for All Jobs

  4. StorageTab

  5. EnvironmentTab

  6. ExecutorsTab

  7. SparkUI — Web UI of Spark Application

  8. BlockStatusListener Spark Listener

  9. EnvironmentListener Spark Listener

  10. ExecutorsListener Spark Listener

  11. JobProgressListener Spark Listener

  12. StorageStatusListener Spark Listener

  13. StorageListener — Spark Listener for Tracking Persistence Status of RDD Blocks

  14. RDDOperationGraphListener Spark Listener

  15. WebUI — Framework For Web UIs

  16. RDDStorageInfo

  17. RDDInfo

  18. LiveEntity

  19. UIUtils

  20. JettyUtils

  21. web UI Configuration Properties

Spark MLlib

  1. Spark MLlib — Machine Learning in Spark

  2. ML Pipelines (spark.ml)

  3. ML Persistence — Saving and Loading Models and Pipelines

  4. Example — Text Classification

  5. Example — Linear Regression

  6. Logistic Regression

  7. Latent Dirichlet Allocation (LDA)

  8. Vector

  9. LabeledPoint

  10. Streaming MLlib

  11. GeneralizedLinearRegression

  12. Alternating Least Squares (ALS) Matrix Factorization

  13. Instrumentation

  14. MLUtils

Spark Core / RDD

  1. Anatomy of Spark Application

  2. SparkConf — Programmable Configuration for Spark Applications

  3. SparkContext

  4. RDD — Resilient Distributed Dataset

  5. Operators

  6. Caching and Persistence

  7. Partitions and Partitioning

  8. Shuffling

  9. Checkpointing

  10. RDD Dependencies

  11. Map/Reduce-side Aggregator

  12. AppStatusStore

  13. AppStatusPlugin

  14. AppStatusListener

  15. KVStore

  16. InterruptibleIterator — Iterator With Support For Task Cancellation

Spark Core / Services

  1. SerializerManager

  2. MemoryManager — Memory Management

  3. SparkEnv — Spark Runtime Environment

  4. DAGScheduler — Stage-Oriented Scheduler

  5. TaskScheduler — Spark Scheduler

  6. SchedulerBackend — Pluggable Scheduler Backends

  7. ExecutorBackend — Pluggable Executor Backends

  8. BlockManager — Key-Value Store of Blocks of Data

  9. MapOutputTracker — Shuffle Map Output Registry

  10. ShuffleManager — Pluggable Shuffle Systems

  11. Serialization

  12. ExternalClusterManager — Pluggable Cluster Managers

  13. BroadcastManager

  14. ContextCleaner — Spark Application Garbage Collector

  15. Dynamic Allocation (of Executors)

  16. HTTP File Server

  17. Data Locality

  18. Cache Manager

  19. OutputCommitCoordinator

  20. RpcEnv — RPC Environment

  21. TransportConf — Transport Configuration

  22. Utils Helper Object

Spark Core / Security

Execution Model

Further Learning

(separate book) Spark Structured Streaming