Skip to content

Latest commit

 

History

History
39 lines (29 loc) · 1.04 KB

spark-sql-streaming-MetadataLog.adoc

File metadata and controls

39 lines (29 loc) · 1.04 KB

MetadataLog — Contract for Metadata Storage

MetadataLog is the contract to store metadata.

MetadataLog Contract

package org.apache.spark.sql.execution.streaming

trait MetadataLog[T] {
  def add(batchId: Long, metadata: T): Boolean
  def get(batchId: Long): Option[T]
  def get(startId: Option[Long], endId: Option[Long]): Array[(Long, T)]
  def getLatest(): Option[(Long, T)]
  def purge(thresholdBatchId: Long): Unit
}
Table 1. MetadataLog Contract
Method Description

add

get

getLatest

Retrieves the latest-committed batch with the metadata if available from the metadata storage.

Note
It is assumed (i.e. FileStreamSink) that the latest batch id is of the batch which has already been committed and a streaming query can start from.

purge