< Table of Contents	Configuration >

Using ArchiveSpark as a Library

Besides the interactive use of ArchiveSpark as a data analysis / corpus building platform as described in Use ArchiveSpark with Jupyter, it can be used as an API to archival collection in your own software, where it can be integrated as a library.

Dependency

The recommended way to include ArchiveSpark as a library in your project is through Maven. For this purpose, we have published ArchiveSpark on Maven Central.

To include it from Maven Central in your Scala SBT project, add the following line to your build.sbt file (please check for the latest version):

libraryDependencies += "com.github.helgeho" %% "archivespark" % "3.0"

In addition to that, there are releases with the plain JAR files available on GitHub: https://github.com/helgeho/ArchiveSpark/releases.

Usage

The general usage of ArchiveSpark is described in this article: General Usage

These instructions require a Spark Context to exist, which is automatically available if you use it with Jupyter as described in Use ArchiveSpark with Jupyter. If you would like to use it as a library in your own project, this Spark Context needs to be created manually as follows:

val appName = "ArchiveSpark"
val master = "yarn-client"

val conf = new SparkConf().setAppName(appName).setMaster(master)
val sc = new SparkContext(conf)

More details about this can be found in the official Spark documentation: Spark Programming Guide.

Fore more information on the use and available DataSpecs, Enrichment Functions as well as the operations provided by ArchiveSpark, please read the following API Docs:

ArchiveSpark Operations
Data Specifications (DataSpecs)
Enrichment Functions

< Table of Contents	Configuration >

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using_Library.md

Using_Library.md

Using ArchiveSpark as a Library

Dependency

Usage

Files

Using_Library.md

Latest commit

History

Using_Library.md

File metadata and controls

Using ArchiveSpark as a Library

Dependency

Usage