Skip to content

Releases: odissei-data/ingestion-workflow-orchestrator

V2.3.1 release

23 Nov 13:42
ed3b1a1
Compare
Choose a tag to compare
V2.3.1 release Pre-release
Pre-release

Updates

First .1 release for v2

25 Oct 08:36
8826a51
Compare
Choose a tag to compare
  • Included Postgres as DB
  • Moved settings to .env
  • Agent now runs nicely inside container as intended.

Updating dependencies and checking build

13 Oct 10:02
9f21a34
Compare
Choose a tag to compare
V2.0.11

Update docker-image.yml

2.0.0 (May 15, 2023)

15 May 14:06
511a0e7
Compare
Choose a tag to compare

Dynaconf configuration

The settings for the different data providers and target dataverse instances have been moved to settings tomls in the configuration directory. A entry ingestion workflow can now have a parameter to specify which settings dictionary it will use. By using this setup all workflows for specific dataverses or subverses have been removed.

min.io

The data that is ingested by the workflows in the orchestrator is now expected to be in a bucket in a min.io object store. Local data ingestion is no longer possible. The setup for the object store needs to be added to the .secrets.toml in the configuration directory.

Universal Dataverse2Dataverse ingestion workflow

All dataverse to dataverse ingestion is now done using the same workflow. Any specific differences between the source metadata that need to be refined are handled in the metadata-refiner service.

Minor changes

  • Dockerfile update to allow for the easy addition of new poetry packages.
  • Added jmespath for querying fields from JSON metadata.
  • Mapping file has been updated for use with the new dataverse-mapper.
  • CBS ingestion workflow now includes an email sanitation task.
  • Added metadata refinement task that is used in d2d ingestion workflow.
  • Refactored xml2json task to work with metadata fetched from minio.
  • Metadata fetcher service now no longer needs a Dataverse source API key.

v1.0.0-beta

30 Jan 15:34
16bdd08
Compare
Choose a tag to compare

Beta v1.0.0

DataverseNL workflow

The DataverseNL workflow can used to ingest metadata from the DataverseNL dataverse instance to a Dataverse instance. The xml metadata is harvested using OAI PMH as oai_dc (dublin core). The harvested metadata is ingested using a prefect workflow. Every Subverse in DataverseNL that contains Social Science data has its own entry point workflow. All subverses use the same DataverseNL workflow for the actual ingestion of the metadata of the datasets.

The data is transformed to JSON, the ID is obtained and is used to fetch the Dataverse JSON metadata. This metadata is then cleaned and imported into Dataverse using. Finally, the publication date is updated and the dataset is published. All the tasks use external services except for the cleaning step.

File management

The entry workflows for the data providers to start the ingestion process have been put into a specific directory. The dataset ingestion workflows have also been added to a specific folder. both live under the flows directory in scripts.

Workflow versioning

A URL to the workflow version dictionary of a specific workflow is added to the metadata of the ingested metadata. The URL is added to a field in the provenance metadata block. The function that creates the dictionary is called in the entry point workflow. You can specify what services are used by a workflow. For every service you will get a dictionary that contains the latest GitHub release, the latest docker image tag, the service version, and its endpoint.

Alpha v0.2.0

15 Dec 10:47
b56a711
Compare
Choose a tag to compare
Alpha v0.2.0 Pre-release
Pre-release

Alpha v0.2.0

Features

New features for the orchestrator:

  • Possible to deploy multiple workflows from different data providers at the same time
  • Uses different env variables for each data provider (check new dot_env_example)
  • Completed first version of the DataverseNL workflow
  • Smaller fixes and changes to tasks and flows

Alpha v0.1.0

13 Dec 11:18
Compare
Choose a tag to compare
Alpha v0.1.0 Pre-release
Pre-release

Alpha v.0.1.0

Description

This first version of the orchestrator works with local files and a .env for a specific data provider. It needs to be redeployed to switch the workflow to a different data provider. The next version will make it possible to deploy flows for all providers at the same time. Included data providers are EASY, CBS, LISS and DataverseNL. All workflows are still experimental and are not the finish product.