This document serves as a starting point for documentation of the SBDI system, services and applications managed by NRM (this can be expanded in the future).
More documentation can also be found in the wiki.
- Cloud servers are hosted by Safespring using Openstack.
- Domains (biodiversitydata.se plus a few more) are managed by Loopia.
- SSL/TLS Certificates are provided by Sectigo (through the IT department at NRM).
- Applications run in Docker, the majority in a Docker Swarm setup consisting of several manager and worker nodes. Some applications run on separate servers.
Many devops tasks are automated using Ansible and Terraform and can be found, along with documentation, in the sbdi-install repository.
This includes:
- Cloud server creation and management
- Application deployment
- Backups
- Monitoring
- and more
Most of the applications are forked from ALA. All of the forked repositories have an sbdi folder containing SBDI specific documentation and configuration. In most repositories there is also a GitHub issue called SBDI modifications which lists and describes the SBDI specific changes we have made to the code. The applications are built using GitHub Actions and published as Docker images.
Follow these instructions when updating an ALA fork.
Bioatlas repos are listed here.
Datasets are published by the various data providers in the GBIF Sweden IPT. Dataset meta data is synced from the IPT to the Atlas using the Collectory. The occurrence records are then loaded into the Atlas using the pipelines application.
Some datasets are loaded from GBIF. These are datasets that contain occurrence records located in Sweden but are published by non-Swedish organizations. They are called repatriated datasets. Only the Swedish records are used from these datasets.
Dataset meta data is synced from the IPT GBIF Sweden data provider page in the Collectory admin interface. Use the Update data resources button. To view differences between datasets in the IPT and the Atlas click Compare IPT vs Atlas.
When a new dataset has been added to the IPT it will be created in the Atlas by the above sync procedure. However, it will require some additional configuration.
On the created Data resource:
- DOI
- Institution
- Resource type (if needed, defaults to
records
) - Darwin core terms that uniquely identify a record (if other than
catalogNumber
) - Default values for DwC fields (if needed)
- Record consumers - institution and collection (after you've created the collection)
Create a new Collection:
- Public description (copy from data resource)
- Contacts (copy from data resource)
- Provider codes - institution and collection (make sure these are present in occurrence.txt, otherwise add to Default values for DwC fields on the data resource)
Dataset meta data is synced from the GBIF Repatriated data provider page in the Collectory admin interface. To view differences between datasets in GBIF and the Atlas click Compare GBIF vs Atlas. From that view the datasets that have changed can be updated.
Adding new repatriated datasets is done by clicking Repatriate data from the Collectory admin start page.
The major steps are listed below, detailed documentation for data ingestion can be found in the pipelines repository and in sbdi-install (terraform and ansible).
- Create and start the pipelines machines:
or manually:
./utils/pipelines/startup.sh
- Create the live-pipelines machines using Terraform
- Machine keys stored in
.ssh/known_hosts
(on your local machine) will have changed and need to be updated in order to connect to the machines:ansible-playbook -i inventories/prod pipelines_local_access.yml
- Update machine keys for hadoop and spark users (on live-pipelines) and then start hadoop and spark:
ansible-playbook -i inventories/prod pipelines.yml -t update_host_keys,start_cluster --ask-become-pass
- (Optional) To monitor the pipelines machines: uncomment live-pipelines in the
monitoring_target
section of the prod inventory. Deploy and restart Prometheus:ansible-playbook -i inventories/prod monitoring.yml -t observer --ask-become-pass
- Run pipelines
- Backup to NRM:
./utils/pipelines/backup-to-nrm.sh
- or manually: Backup UUID:s and logs to nrm-sbdibackup
- Remove the live-pipelines machines
./utils/pipelines/shutdown.sh
- or using OpenStack API:
openstack server stop live-pipelines-1 live-pipelines-2 live-pipelines-3 live-pipelines-4 live-pipelines-5 live-pipelines-6 live-pipelines-7
openstack server delete live-pipelines-1 live-pipelines-2 live-pipelines-3 live-pipelines-4 live-pipelines-5 live-pipelines-6 live-pipelines-7
- or manually in Safespring UI (Shut Off Instance followed by Delete Instance)
- or using OpenStack API: