-
Notifications
You must be signed in to change notification settings - Fork 75
nautilus OSD operations
The following chapters will try to give an overview of the Scenarios we need to take care of when migrating from ceph-disk to ceph-volume (and the associated change in proposal-format)
In this scenario you are still running ceph-disk based OSDs and also use the legacy proposal format. Adding a new OSD consists of multiple steps:
- Add a new device to the node
At this point you can't use the old proposal-runner anymore, since it's removed. You will rather have to create a corresponding new-style proposal-format (ref#jans_wiki)
- Follow the normal procedure to deploy any disks
No actions have to be taken
No actions have to be taken. The replace from SES5 can be re-used.
The steps that are taken internally differ slightly.
- Remove (mark-destroy, ceph-volume zap.. which is encapsulated in the runner)
- Physically remove the disk
- Add a new disk
- Follow the normal procedure to deploy any disks
It should be noted that we try to no longer store information on-disk but rather try to compute/retrieve the data directly from the cluster. The information that is needed in this case is:
For a standalone OSD:
- osd_id
For a OSD with dedicated WAL/DB:
- osd_id
- wal disk & partition number
- db disk & partition number
Previously we stored the information in /etc/ceph/destroyedOSDs.yml.
Now we try to gather the osd_id
from ceph itself by querying OSDs that were 'mark-destoryed'
previously.
Regarding the wal/db detection; Since ceph-volume can also use partitions to create a VG on it, to subsequently use it as a wal/db, we can leverage this ability to re-use those partitions. (we may get them through ceph-volume inventory)
This is a great chance to move to the new proposal-format. Nothing will change here.
The old format is very prescriptive while the new format tries to be descriptive..
This chapter will explain the reasons behind the change.
Pre-nautilus ceph used ceph-disk
as default disk handling tool which used was a python script that accepted bare-disks and pre-paritioned disks as parameters to create OSDs from. Day-2/Management operations like replacing disks required us to keep track of things like 'which OSD is deployed on which shared device(s) and on which partition?'
If an operation was performed manually(without DeepSea) we end up with stale data, which leads to issues.
ceph-volume however uses logical volumes to deploy an OSD on. On of the reasons for this is flexibility and an easier way of managing the OSDs (lvm tags, metadata retrieval, resizing etc)
It also implements a batch
command that allows to pass multiple devices together with a ruleset. ceph-volume then internally arranges the disks to a thoughtful layout and deploys it. It also detects already deployed disks and skips them.
ceph-volume now brings a inventory
function that allows to detect OSD and non-OSD disks easily.
Having all these features in place, we can now describe
a layout in a more abstract way, which allows us to avoid
defining each device with all it's properties. As a result we don't have to face the described problems any longer.
We felt that having a file-based layout is nice to get a good visual representation of the layout. Time and experience have proven that it adds more complications than profits. There is an issue we carry with us for a quite long time which should be noted explicitly here.
- Prerequisites
- Manual Installation
- Custom Profiles
- Alternate Installations
- Automated Installation
- Purging
- Reinstallation
- Replacing an OSD
- Inspecting the Configuration
- Understanding Pathnames and Arguments
- Overriding Default Settings
- Overriding Default Steps
- Man Pages
- deepsea.1
- deepsea.7
- deepsea-commands.7
- deepsea-minions.7
- deepsea-monitor.1
- deepsea-policy.cfg.5
- deepsea-stage.1
- deepsea-stage-dry-run.1
- deepsea-stage-run.1
- deepsea-stages.7
- Backporting
- Testing
- Branches & Releases