Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standard for volume backup functionality #567

Merged
merged 21 commits into from
Sep 17, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
47957ab
Add standard for volume backup functionality
markus-hentsch Apr 16, 2024
8e5eb04
Add test script to check volume backup API
markus-hentsch Apr 16, 2024
1563b7c
Add README for test script
markus-hentsch Apr 16, 2024
db12159
Mention test suite in standard
markus-hentsch Apr 16, 2024
b91620c
Degrade standard to proposal for now
markus-hentsch Apr 16, 2024
73441e4
Test script: wait for resource status in cleanup
markus-hentsch Apr 22, 2024
0beb9f1
Test script: wait between interdependent cleanups
markus-hentsch Apr 22, 2024
c51cf02
Update Standards/scs-XXXX-v1-volume-backup-service.md
markus-hentsch Apr 23, 2024
1fd8aae
Test script: minor error message improvement
markus-hentsch Apr 23, 2024
2910844
Test script: adjust CLI description
markus-hentsch Apr 29, 2024
61b5454
Apply review suggestion
markus-hentsch Jun 27, 2024
3a853e7
Add new design consideration about storage backend choice
markus-hentsch Jun 27, 2024
a485855
Add documentation link regarding backup drivers
markus-hentsch Jun 28, 2024
6967778
Mention that tests don't cover the optional part
markus-hentsch Jul 26, 2024
58f1c07
Merge branch 'main' into issue/541-volume-backup-standard
markus-hentsch Aug 19, 2024
9ba095b
Align header naming with latest standards template
markus-hentsch Aug 19, 2024
b8e73a3
Merge branch 'main' into issue/541-volume-backup-standard
markus-hentsch Sep 2, 2024
6cb1cc0
Address review comments
markus-hentsch Sep 2, 2024
c58b535
Merge branch 'main' into issue/541-volume-backup-standard
markus-hentsch Sep 17, 2024
66cdd6a
Assign standard number
markus-hentsch Sep 17, 2024
125b3bb
Change to draft state
markus-hentsch Sep 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions Standards/scs-XXXX-v1-volume-backup-service.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
title: Volume Backup Functionality
type: Standard
status: Proposal
track: IaaS
---

## Introduction

OpenStack offers a variety of resources where users are able to transfer and store data in the infrastructure.
A prime example of these resources are volumes which are attached to virtual machines as virtual block storage devices.
As such they carry potentially large amounts of user data which is constantly changing at runtime.
It is important for users to have the ability to create backups of this data in a reliable and effifcient manner.

## Terminology

| Term | Meaning |
|---|---|
| CSP | Cloud Service Provider, provider managing the OpenStack infrastructure |
josephineSei marked this conversation as resolved.
Show resolved Hide resolved

## Motivation

The [volume backup functionality of OpenStack](https://docs.openstack.org/cinder/latest/admin/volume-backups.html) is a feature that is not available in all OpenStack clouds per default.
josephineSei marked this conversation as resolved.
Show resolved Hide resolved
The feature requires a backend to be prepared and configured correctly before it can be used.
In Cinder, this is a separate configuration to the general storage backend of the volume service and is not mandatory.
josephineSei marked this conversation as resolved.
Show resolved Hide resolved
Thus, an arbitrary OpenStack cloud may or may not offer this feature.
josephineSei marked this conversation as resolved.
Show resolved Hide resolved

This standard aims to make this functionality the default in SCS clouds so that customers can expect the feature to be usable.

## Design Considerations

The standard should make sure that the feature is available and usable but should not limit the exact implementation (e.g. choice of backend driver) any further than necessary.

### Options considered

#### Only recommend volume backup feature, use images as alternative

josephineSei marked this conversation as resolved.
Show resolved Hide resolved
As an alternative to the volume backup feature of the Block Storage API, Glance images can also be created based on volumes and act as a backup under certain circumstances.
As an option, this standard could keep the actual integration of the volume backup feature optional and guide users how to use images as backup targets instead in case the feature is unavailable.

However, it is not guaranteed that the Glance backend storage is separate from the volume storage.
For instance, both could be using the same Ceph cluster.
In such case, the images would not count as genuine backups.

Although users are able to download images and transfer them to a different storage location, this approach might also prove unfeasible depending on the image size and the existence (or lack) of appropriate target storage on the user side.

Furthermore, incremental backups are not possible when creating Glance images from volumes either.
This results in time-consuming backup operations of fully copying a volume everytime a backup is created.

#### Focus on feature availability, make feature mandatory

This option is pretty straightforward.
It would make the volume backup feature mandatory for SCS clouds.
This way users can expect the feature to be available and usable.

With this, users can leverage functionalities like incremental backups and benefit from optimized performance of the backup process due to the tight integration with the volume service.

However, it does not seem feasible to also mandate having a separate storage backend for volume backups at the same time due to potential infrastructure limitations at CSP-side making it hard or even impossible to offer.
As such, the actual benefit of backups in terms of reliability and security aspects would be questionable if a separate storage backend is not mandated and therefore not guaranteed.

This approach would focus on feature availability rather than backup reliability.

#### Focus on backup reliability, make separate backend mandatory

As an alternative, the volume backup feature availability could be made optional but in case a CSP chooses to offer it, the standard would mandate a separate storage backend to be used for volume backups.
This way, failures of the volume storage backend would not directly impact the availability and safety of volume backups, making them actually live up to their name.

In contrast to the above, this approach would focus on backup reliability rather than feature availability.

markus-hentsch marked this conversation as resolved.
Show resolved Hide resolved
## Standard

This standard decides to go with the second option and makes the volume backup feature mandatory in the following way:

In an SCS cloud, the volume backup functionality MUST be configured properly and its API as defined per `/v3/{project_id}/backups` MUST be offered to customers.
markus-hentsch marked this conversation as resolved.
Show resolved Hide resolved
If using Cinder, a suitable [backup driver](https://docs.openstack.org/cinder/latest/configuration/block-storage/backup-drivers.html) MUST be set up.

The volume backup target storage SHOULD be a separate storage system from the one used for volumes themselves.
markus-hentsch marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is not possible to know as a user, whether a volume backup storage is different from the normal volume storage. We should encourage CSPs to give that information to customers. This could also be done via gaiax credentials, couldn't it? Maybe we need another issue for this part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require a deep dive into the Gaia-X Ontology to figure out whether this can be expressed using Gaia-X Credentials (formerly Self-Descriptions) appropriately.

As long as we don't have a standardized and proven way of representing arbitrary self-description information about an SCS cloud and its services in Gaia-X Credentials, I'm hesitant to add any such suggestions to the standard yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought so - Maybe we can discuss in the next Standardization SIG meeting or tomorrow, if and how we can use the gaia-x credentials for tests.


## Related Documents

- [OpenStack Block Storage v3 Backup API reference](https://docs.openstack.org/api-ref/block-storage/v3/index.html#backups-backups)
- [OpenStack Volume Backup Drivers](https://docs.openstack.org/cinder/latest/configuration/block-storage/backup-drivers.html)

## Conformance Tests

Conformance tests include using the `/v3/{project_id}/backups` Block Storage API endpoint to create a volume and a backup of it as a non-admin user and subsequently restore the backup on a new volume while verifying the success of each operation.
markus-hentsch marked this conversation as resolved.
Show resolved Hide resolved
These tests verify the mandatory part of the standard: providing the Volume Backup API.

There is a test suite in [`volume-backup-tester.py`](https://github.com/SovereignCloudStack/standards/blob/main/Tests/iaas/volume-backup/volume-backup-tester.py).
The test suite connects to the OpenStack API and executes basic operations using the volume backup API to verify that the functionality requested by the standard is available.
Please consult the associated [README.md](https://github.com/SovereignCloudStack/standards/blob/main/Tests/iaas/volume-backup/README.md) for detailed setup and testing instructions.

Note that these tests don't verify the optional part of the standard: providing a separate storage backend for Cinder volume backups.
This cannot be checked from outside of the infrastructure as it is an architectural property of the infrastructure itself and transparent to customers.
70 changes: 70 additions & 0 deletions Tests/iaas/volume-backup/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Volume Backup API Test Suite

## Test Environment Setup

### Test Execution Environment

> **NOTE:** The test execution procedure does not require cloud admin rights.

To execute the test suite a valid cloud configuration for the OpenStack SDK in the shape of "`clouds.yaml`" is mandatory[^1].
**The file is expected to be located in the current working directory where the test script is executed unless configured otherwise.**

[^1]: [OpenStack Documentation: Configuring OpenStack SDK Applications](https://docs.openstack.org/openstacksdk/latest/user/config/configuration.html)

The test execution environment can be located on any system outside of the cloud infrastructure that has OpenStack API access.
Make sure that the API access is configured properly in "`clouds.yaml`".

It is recommended to use a Python virtual environment[^2].
Next, install the OpenStack SDK required by the test suite:

```bash
pip3 install openstacksdk
```

Within this environment execute the test suite.

[^2]: [Python 3 Documentation: Virtual Environments and Packages](https://docs.python.org/3/tutorial/venv.html)

## Test Execution

The test suite is executed as follows:

```bash
python3 volume-backup-tester.py --os-cloud mycloud
```

As an alternative to "`--os-cloud`", the "`OS_CLOUD`" environment variable may be specified instead.
The parameter is used to look up the correct cloud configuration in "`clouds.yaml`".
For the example command above, this file should contain a `clouds.mycloud` section like this:

```yaml
---
clouds:
mycloud:
auth:
auth_url: ...
...
...
```

If the test suite fails and leaves test resources behind, the "`--cleanup-only`" flag may be used to delete those resources from the domains:

```bash
python3 volume-backup-tester.py --os-cloud mycloud --cleanup-only
```

For any further options consult the output of "`python3 volume-backup-tester.py --help`".

### Script Behavior & Test Results

> **NOTE:** Before any execution of test batches, the script will automatically perform a cleanup of volumes and volume backups matching a special prefix (see the "`--prefix`" flag).
> This cleanup behavior is identical to "`--cleanup-only`".

The script will print all cleanup actions and passed tests to `stdout`.

If all tests pass, the script will return with an exit code of `0`.

If any test fails, the script will halt, print the exact error to `stderr` and return with a non-zero exit code.

In case of a failed test, cleanup is not performed automatically, allowing for manual inspection of the cloud state for debugging purposes.
Although unnecessary due to automatic cleanup upon next execution, you can manually trigger a cleanup using the "`--cleanup-only`" flag of this script.
Loading