nautilus OSD operations

ceph-volume wip

The following chapters will try to give an overview of the Scenarios we need to take care of when migrating from ceph-disk to ceph-volume (and the associated change in proposal-format)

Scenarios

Add an OSD

In this scenario you are still running ceph-disk based OSDs and also use the legacy proposal format. Adding a new OSD consists of multiple steps:

Add a new device to the node

At this point you can't use the old proposal-runner anymore, since it's removed. You will rather have to create a corresponding new-style proposal-format (ref#jans_wiki)

Follow the normal procedure to deploy any disks

Remove an OSD

No actions have to be taken

Replace an OSD

No actions have to be taken. The replace from SES5 can be re-used.

The steps that are taken internally differ slightly.

Remove (mark-destroy, ceph-volume zap.. which is encapsulated in the runner)
Physically remove the disk
Add a new disk
Follow the normal procedure to deploy any disks

It should be noted that we try to no longer store information on-disk but rather try to compute/retrieve the data directly from the cluster. The information that is needed in this case is:

For a standalone OSD:

osd_id

For a OSD with dedicated WAL/DB:

osd_id
wal disk & partition number
db disk & partition number

Previously we stored the information in /etc/ceph/destroyedOSDs.yml.

Now we try to gather the osd_id from ceph itself by querying OSDs that were 'mark-destoryed' previously.

Regarding the wal/db detection; Since ceph-volume can also use partitions to create a VG on it, to subsequently use it as a wal/db, we can leverage this ability to re-use those partitions. (we may get them through ceph-volume inventory)

Migrate from $x_store to $y_store (or $x_setup to $y_setup)

This is a great chance to move to the new proposal-format. Nothing will change here.

Migrating from old-style to new-style layout format

The old format is very prescriptive while the new format tries to be descriptive..

Reasoning behind the change

This chapter will explain the reasons behind the change.

ceph-volume

Pre-nautilus ceph used ceph-disk as default disk handling tool which used was a python script that accepted bare-disks and pre-paritioned disks as parameters to create OSDs from. Day-2/Management operations like replacing disks required us to keep track of things like 'which OSD is deployed on which shared device(s) and on which partition?' If an operation was performed manually(without DeepSea) we end up with stale data, which leads to issues.

ceph-volume however uses logical volumes to deploy an OSD on. On of the reasons for this is flexibility and an easier way of managing the OSDs (lvm tags, metadata retrieval, resizing etc)

It also implements a batch command that allows to pass multiple devices together with a ruleset. ceph-volume then internally arranges the disks to a thoughtful layout and deploys it. It also detects already deployed disks and skips them.

ceph-volume now brings a inventory function that allows to detect OSD and non-OSD disks easily.

Having all these features in place, we can now describe a layout in a more abstract way, which allows us to avoid defining each device with all it's properties. As a result we don't have to face the described problems any longer.

Management

We felt that having a file-based layout is nice to get a good visual representation of the layout. Time and experience have proven that it adds more complications than profits. There is an issue we carry with us for a quite long time which should be noted explicitly here.