Dorothy is a unified solution for data management, versioning, hosting, and distribution, and aims to be accessible to researchers in any field, working from anywhere, managing any kind of data, from initial data curation through to publication and long-term archiving.
While Dorothy is still a work in progress, we have four ambitious objectives:
-
Make data more transparent. Researchers will be able to easily track versions of their data over time, linking specific versions to particular analyses.
-
Increase data accessibility. Anyone with an internet connection will be able to quickly download and contribute data products, either via centralized repositories or from the peer-to-peer network, opening collaboration and access possibilities otherwise impossible.
-
Improve reproducibility. Datasets are referenceable by their content, not their names. Subsequent efforts built on such data can be certain that the data assets are identical to those used previously, improving reproducibility.
-
Further inclusive practices. Dorothy will provide both the tools and venue for diverse and inclusive communities of researchers around the world, analogous to GitHub for software developers. Dorothy will also provide data storage and dissemination resources to those without the means to run their own Dorothy node.
Ideally, updating a dataset should be as simple as:
# Clone an existing dataset to your machine
$ dorothy clone https://dorothy.39alpharesearch.org/team/dataset
$ cd dataset
# View the history
$ dorothy log
# Checkout a version
$ dorothy checkout Qm123 data
# Edit the data
# Commit a new version
$ dorothy commit data
# Push the changes back to the remote host
$ dorothy push
Dorothy comes with a "dataforge" analgous to Gitlab/Github, but specifically for managing datasets.
$ dorothy serve
Anyone can host a Dorothy dataforge if they choose, or use a
$ git clone https://github.com/39alpha/dorothy
$ cd dorothy
$ make
$ make install
$ sudo mv dorothy /usr/bin/dorothy # not ideal, but it's what we've got ATM
- Foundations and Inspiration
- Alternatives
Copyright © 2023-2024 39 Alpha Research. Free use of this software is granted under the terms of the MIT License.
This project was supported by the National Aeronautics and Space Administration (NASA) under Grant Number 22-HPOSS22-0021, through Research Opportunities in Space and Earth Science (ROSES-2022), Program Element F.15 High Priority Open-Source Science.
If you wish to further support this project, or 39 Alpha Research in general, please visit https://39alpharesearch.org/donate.