Skip to content

A library for Spark that helps to stadardize any input data (DataFrame) to adhere to the provided schema.

License

Notifications You must be signed in to change notification settings

AbsaOSS/spark-data-standardization

Repository files navigation

Spark Data Standardization Library

License Release

  • Dataframe in
  • Standardized Dataframe out

Usage

Needed Provided Dependencies

The library needs following dependencies to be included in your project

"org.apache.spark" %% "spark-core" % SPARK_VERSION,
"org.apache.spark" %% "spark-sql" % SPARK_VERSION,
"za.co.absa" %% s"spark-commons-spark${SPARK_MAJOR}.${SPARK_MINOR}" % "0.6.1",

Usage in SBT:

"za.co.absa" %% "spark-data-standardization" % VERSION 

Usage in Maven

Scala 2.11 Maven Central

<dependency>
   <groupId>za.co.absa</groupId>
   <artifactId>spark-data-standardization_2.11</artifactId>
   <version>${latest_version}</version>
</dependency>

Scala 2.12 Maven Central

<dependency>
   <groupId>za.co.absa</groupId>
   <artifactId>spark-data-standardization_2.12</artifactId>
   <version>${latest_version}</version>
</dependency>

Scala 2.13 Maven Central

<dependency>
   <groupId>za.co.absa</groupId>
   <artifactId>spark-data-standardization_2.13</artifactId>
   <version>${latest_version}</version>
</dependency>

Spark and Scala compatibility

Scala 2.11 Scala 2.12 Scala 2.13
Spark 2.4.7 3.2.1 3.2.1

How to Release

Please see this file for more details.

How to generate Code coverage report

sbt ++<scala.version> jacoco

Code coverage will be generated on path:

{project-root}/target/scala-{scala_version}/jacoco/report/html

About

A library for Spark that helps to stadardize any input data (DataFrame) to adhere to the provided schema.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages