Documentation overview

This document serves as a starting point for documentation of the SBDI system, services and applications managed by NRM (this can be expanded in the future).

More documentation can also be found in the wiki.

Hosting

Cloud servers are hosted by Safespring using Openstack.
Domains (biodiversitydata.se plus a few more) are managed by Loopia.
SSL/TLS Certificates are provided by Sectigo (through the IT department at NRM).
Applications run in Docker, the majority in a Docker Swarm setup consisting of several manager and worker nodes. Some applications run on separate servers.

Server overview

Application overview

DevOps

Many devops tasks are automated using Ansible and Terraform and can be found, along with documentation, in the sbdi-install repository.

This includes:

Cloud server creation and management
Application deployment
Backups
Monitoring
and more

Regular maintenance and checkup

Applications

Most of the applications are forked from ALA. All of the forked repositories have an sbdi folder containing SBDI specific documentation and configuration. In most repositories there is also a GitHub issue called SBDI modifications which lists and describes the SBDI specific changes we have made to the code. The applications are built using GitHub Actions and published as Docker images.

Follow these instructions when updating an ALA fork.

Bioatlas repos are listed here.

Data

Datasets are published by the various data providers in the GBIF Sweden IPT. Dataset meta data is synced from the IPT to the Atlas using the Collectory. The occurrence records are then loaded into the Atlas using the pipelines application.

Some datasets are loaded from GBIF. These are datasets that contain occurrence records located in Sweden but are published by non-Swedish organizations. They are called repatriated datasets. Only the Swedish records are used from these datasets.

Syncing datasets from IPT to Atlas

Dataset meta data is synced from the IPT GBIF Sweden data provider page in the Collectory admin interface. Use the Update data resources button. To view differences between datasets in the IPT and the Atlas click Compare IPT vs Atlas.

Adding a new dataset

When a new dataset has been added to the IPT it will be created in the Atlas by the above sync procedure. However, it will require some additional configuration.

On the created Data resource:

DOI
Institution
Resource type (if needed, defaults to records)
Darwin core terms that uniquely identify a record (if other than catalogNumber)
Default values for DwC fields (if needed)
Record consumers - institution and collection (after you've created the collection)

Create a new Collection:

Public description (copy from data resource)
Contacts (copy from data resource)
Provider codes - institution and collection (make sure these are present in occurrence.txt, otherwise add to Default values for DwC fields on the data resource)

Syncing (repatriated) datasets from GBIF to Atlas

Dataset meta data is synced from the GBIF Repatriated data provider page in the Collectory admin interface. To view differences between datasets in GBIF and the Atlas click Compare GBIF vs Atlas. From that view the datasets that have changed can be updated.

Adding a new dataset

Adding new repatriated datasets is done by clicking Repatriate data from the Collectory admin start page.

Loading records into the Atlas

The major steps are listed below, detailed documentation for data ingestion can be found in the pipelines repository and in sbdi-install (terraform and ansible).

Create and start the pipelines machines:
```
./utils/pipelines/startup.sh
```
or manually:
- Create the live-pipelines machines using Terraform
- Machine keys stored in .ssh/known_hosts (on your local machine) will have changed and need to be updated in order to connect to the machines:
```
ansible-playbook -i inventories/prod pipelines_local_access.yml
```
- Update machine keys for hadoop and spark users (on live-pipelines) and then start hadoop and spark:
```
ansible-playbook -i inventories/prod pipelines.yml -t update_host_keys,start_cluster --ask-become-pass
```
(Optional) To monitor the pipelines machines: uncomment live-pipelines in the monitoring_target section of the prod inventory. Deploy and restart Prometheus:
```
ansible-playbook -i inventories/prod monitoring.yml -t observer --ask-become-pass
```
Run pipelines
Backup to NRM:
```
./utils/pipelines/backup-to-nrm.sh
```
- or manually: Backup UUID:s and logs to nrm-sbdibackup

Remove the live-pipelines machines

./utils/pipelines/shutdown.sh

or using OpenStack API:

openstack server stop live-pipelines-1 live-pipelines-2 live-pipelines-3 live-pipelines-4 live-pipelines-5 live-pipelines-6 live-pipelines-7

openstack server delete live-pipelines-1 live-pipelines-2 live-pipelines-3 live-pipelines-4 live-pipelines-5 live-pipelines-6 live-pipelines-7

or manually in Safespring UI (Shut Off Instance followed by Delete Instance)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Documentation overview

Hosting

Server overview

Application overview

DevOps

Regular maintenance and checkup

Applications

Data

Syncing datasets from IPT to Atlas

Adding a new dataset

Syncing (repatriated) datasets from GBIF to Atlas

Adding a new dataset

Loading records into the Atlas

About

Releases

Packages

biodiversitydata-se/documentation-overview

Folders and files

Latest commit

History

Repository files navigation

Documentation overview

Hosting

Server overview

Application overview

DevOps

Regular maintenance and checkup

Applications

Data

Syncing datasets from IPT to Atlas

Adding a new dataset

Syncing (repatriated) datasets from GBIF to Atlas

Adding a new dataset

Loading records into the Atlas

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages