Skip to content
This repository has been archived by the owner on Dec 7, 2019. It is now read-only.

GlobalWebIndex/storage-partitioner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

storage-partitioner

DroneCI storage-partitioner-api storage-partitioner-all storage-partitioner-s3 storage-partitioner-gcs storage-partitioner-cql storage-partitioner-druid

"net.globalwebindex" %% "storage-partitioner-api" % "x.y.z"
"net.globalwebindex" %% "storage-partitioner-all" % "x.y.z"
"net.globalwebindex" %% "storage-partitioner-s3" % "x.y.z"
"net.globalwebindex" %% "storage-partitioner-gcs" % "x.y.z"
"net.globalwebindex" %% "storage-partitioner-cql" % "x.y.z"
"net.globalwebindex" %% "storage-partitioner-druid" % "x.y.z"

This project targets primarily storages like FS, S3, FTP, etc., that :

  • do not have any kind of built-in partitioning like databases do
  • cannot be searched easily, so that you want to reduce the area to be searched the hard way

But even columnar databases need some kind of partitioning management because they persist data denormalized and it is not exactly easy to track partition state.

Partitioning then must be implemented on client side for such storages and this is what this library helps with. Currently only time series data is supported and implementation is provided for s3, druid, cassandra or scyllaDB.

When building an ETL pipeline that extracts and loads data with the same partitioning between various storage types, the user must focus on Transform instead of Extract and Load.

Note that :

  • this library is extremely WIP, adding one more storage could lead to heavy API changes.
  • this way of "integration by abstraction" might seem a bit wrong and a way of storage "Sinks" and "Sources" makes better sense but in case of time series data, if you take partitioning and granularity into consideration, it would be very hard to implement something like these generic Sinks and Sources, however it might go this direction further on

About

Abstraction over storages with partitioned data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages