Skip to content

Latest commit

 

History

History
83 lines (59 loc) · 2.5 KB

spark-BlockDataManager.adoc

File metadata and controls

83 lines (59 loc) · 2.5 KB

BlockDataManager — Block Storage Management API

BlockDataManager is the contract for managing storage for blocks of data (aka block storage management API).

package org.apache.spark.network

trait BlockDataManager {
  def getBlockData(blockId: BlockId): ManagedBuffer
  def putBlockData(
    blockId: BlockId,
    data: ManagedBuffer,
    level: StorageLevel,
    classTag: ClassTag[_]): Boolean
  def releaseLock(blockId: BlockId, taskAttemptId: Option[Long]): Unit
}
Note
BlockDataManager is a private[spark] contract.
Table 1. BlockDataManager Contract
Method Description

getBlockData

Fetches a local block data by blockId

Used when:

putBlockData

Uploads a block data locally by blockId. The return value says whether the operation has succeeded (true) or failed (false).

Used when…​FIXME

releaseLock

Releases the lock for getBlockData and putBlockData methods

Used when…​FIXME

Blocks are identified by BlockId that has a globally unique identifier (name) and stored as ManagedBuffer.

Table 2. BlockIds
Name Description

RDDBlockId

Described by RDD ID (rddId) and a partition index (splitIndex)

Created when an RDD is requested to get or compute an RDD partition (identified by splitIndex).

ShuffleBlockId

Described by shuffleId, mapId and reduceId

ShuffleDataBlockId

Described by shuffleId, mapId and reduceId

ShuffleIndexBlockId

Described by shuffleId, mapId and reduceId

BroadcastBlockId

Described by broadcastId identifier and optional field

TaskResultBlockId

Described by taskId

StreamBlockId

Described by streamId and uniqueId

Note
BlockManager is the one and only known implementation of BlockDataManager Contract in Apache Spark.