Releases · odissei-data/ingestion-workflow-orchestrator

The settings for the different data providers and target dataverse instances have been moved to settings tomls in the configuration directory. A entry ingestion workflow can now have a parameter to specify which settings dictionary it will use. By using this setup all workflows for specific dataverses or subverses have been removed.

min.io

The data that is ingested by the workflows in the orchestrator is now expected to be in a bucket in a min.io object store. Local data ingestion is no longer possible. The setup for the object store needs to be added to the .secrets.toml in the configuration directory.

Universal Dataverse2Dataverse ingestion workflow

All dataverse to dataverse ingestion is now done using the same workflow. Any specific differences between the source metadata that need to be refined are handled in the metadata-refiner service.

Minor changes

Dockerfile update to allow for the easy addition of new poetry packages.
Added jmespath for querying fields from JSON metadata.
Mapping file has been updated for use with the new dataverse-mapper.
CBS ingestion workflow now includes an email sanitation task.
Added metadata refinement task that is used in d2d ingestion workflow.
Refactored xml2json task to work with metadata fetched from minio.
Metadata fetcher service now no longer needs a Dataverse source API key.

Assets 2

30 Jan 15:34

FjodorvRijsselberg

v1.0.0-beta

16bdd08

v1.0.0-beta

Beta v1.0.0

DataverseNL workflow

The DataverseNL workflow can used to ingest metadata from the DataverseNL dataverse instance to a Dataverse instance. The xml metadata is harvested using OAI PMH as oai_dc (dublin core). The harvested metadata is ingested using a prefect workflow. Every Subverse in DataverseNL that contains Social Science data has its own entry point workflow. All subverses use the same DataverseNL workflow for the actual ingestion of the metadata of the datasets.

The data is transformed to JSON, the ID is obtained and is used to fetch the Dataverse JSON metadata. This metadata is then cleaned and imported into Dataverse using. Finally, the publication date is updated and the dataset is published. All the tasks use external services except for the cleaning step.

File management

The entry workflows for the data providers to start the ingestion process have been put into a specific directory. The dataset ingestion workflows have also been added to a specific folder. both live under the flows directory in scripts.

Workflow versioning

A URL to the workflow version dictionary of a specific workflow is added to the metadata of the ingested metadata. The URL is added to a field in the provenance metadata block. The function that creates the dictionary is called in the entry point workflow. You can specify what services are used by a workflow. For every service you will get a dictionary that contains the latest GitHub release, the latest docker image tag, the service version, and its endpoint.

Assets 2

15 Dec 10:47

FjodorvRijsselberg

v0.2.0-alpha

b56a711

Alpha v0.2.0 Pre-release

Pre-release

Alpha v0.2.0

Features

New features for the orchestrator:

Possible to deploy multiple workflows from different data providers at the same time
Uses different env variables for each data provider (check new dot_env_example)
Completed first version of the DataverseNL workflow
Smaller fixes and changes to tasks and flows

Assets 2

13 Dec 11:18

FjodorvRijsselberg

v0.1.0-alpha

c5a9f26

Alpha v0.1.0 Pre-release

Pre-release

Alpha v.0.1.0

Description

This first version of the orchestrator works with local files and a .env for a specific data provider. It needs to be redeployed to switch the workflow to a different data provider. The next version will make it possible to deploy flows for all providers at the same time. Included data providers are EASY, CBS, LISS and DataverseNL. All workflows are still experimental and are not the finish product.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynaconf configuration

min.io

Universal Dataverse2Dataverse ingestion workflow

Minor changes

Beta v1.0.0

DataverseNL workflow

File management

Workflow versioning

Alpha v0.2.0

Features

Alpha v.0.1.0

Description

Releases: odissei-data/ingestion-workflow-orchestrator

V2.3.1 release

First .1 release for v2

Updating dependencies and checking build

2.0.0 (May 15, 2023)

Dynaconf configuration

min.io

Universal Dataverse2Dataverse ingestion workflow

Minor changes

v1.0.0-beta

Beta v1.0.0

DataverseNL workflow

File management

Workflow versioning

Alpha v0.2.0

Alpha v0.2.0

Features

Alpha v0.1.0

Alpha v.0.1.0

Description