Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New deployment and day-2-ops tooling software defined storage (ceph) - ADR #515

Open
7 of 9 tasks
brueggemann opened this issue Jan 8, 2024 · 9 comments · May be fixed by SovereignCloudStack/standards#461 or SovereignCloudStack/standards#737
Assignees
Labels
IaaS Issues or pull requests relevant for Team1: IaaS SCS-VP03 Related to tender lot SCS-VP03

Comments

@brueggemann
Copy link

brueggemann commented Jan 8, 2024

As a SCS Operator, I want a well considered and justified decision for a reliable method to deploy and operate ceph to replace ceph-ansible.

Criteria:

  • The solution should be designed for at least 10 years
  • The decision has to be well considered and justified based on real scenarios of cloud providers
  • The migration of the deployment method should be as easy as possible and with minimal downtime.

Tasks (see decision tracking document for detailed status):

  • Find more criteria and document
  • Gather information about reference setups from cloud providers (critera, migration path)
  • Check, what similar projects like OSISM are doing
  • Research which deployment method for ceph is preferrable (Pros / Cons)
    • cephadm
    • rook
  • Create proof-of-concept setups for the considered deployment methods
  • Document decision making process

Definition of Done:

  • There exists an ADR that documents the evaluation process as well as outlines the path forward (eg: which solution have we committed ourselves to)

Decision tracking document

@yeoldegrove
Copy link

yeoldegrove commented Jan 10, 2024

For task "Gather information about reference setups from cloud providers (critera, migration path)"

questions to cloud providers and/or customers

We want to have a better understanding which Ceph setups, deployed by OSISM, you are currently running. We hope that this input helps us to decide on how to move forward on a possible replacement of ceph-ansible in OSISM.

  • Which ceph release are you running?
  • What is the size of your ceph cluster?
    • Are Ceph workloads sharing the hardware with other workloads (hyperconverged)? If yes, why?
    • Are you running multiple pools or even multiple clusters? If yes, why?
  • Which ceph features/daemons are you using and how are they integrated into OpenStack and/or other services?
  • Which hardware are you using (either sizing or specs)?
    • CPU/RAM
    • HDDs/SSDs/NVMEs(/Controllers)
      • Are you splitting "OSD setup" and "BlueStore WAL+DB"
    • NICs/speed/latency
      • Are you using splitting Dataplane and Controlplane?
  • Which Ceph config is deployed by OSISM?
    • Do you mind sharing the acutal config/yaml?
  • Which Ceph config is deployed "unknown to" or "on top of" OSISM"?
    • e.g. special crush maps, special configs
  • Would it be nice to have more Ceph features deployable via OSISM?
  • Are there any Ceph features that have to be deployed by OSISM currently that OSISM should not handle?
  • What is your justified opinion on a new deployment method for Ceph (instead of ceph-ansible)?
    • What about Cephadm?
      • Are you maybe already using it in your current cluster (deployed by hand)?
    • What about Rook?
      • Are you maybe already using it on top of a k8s deployed in OpenStack?
  • Are there any other exciting facts about your Ceph setup you would like to share?

@horazont
Copy link
Member

We are using Rook for all our new Ceph deployments. Previously, we used Ceph-Chef.

Deployment Method: Rook is the natural choice for us because we are running Kubernetes on bare metal already for YAOOK Operator. It integrates well with YAOOK because both of them are using Kubernetes.

In addition, we have made excellent experience with the performance, maintainability and reliability of Rook.io clusters, in particular compared to our previous static deployment method (Ceph-Chef).

All methods have their downsides, and so does the Rook method. In partiuclar:

  • Rook suffers in particular from ceph-volume shortcomings, because it's not that easy to bypass these shortcomings when going through Rook. (We suffered a lot when we had multipath devices which weren't handled correctly by ceph-volume.)
  • You need to have some knowledge in Kubernetes concepts in addition to Ceph concepts to run it. (Though in contrast to e.g. cephadm specific knowledge, the Kubernetes knowledge to obtain is likely to be useful in other situations, too.)
  • No support for RadosGW + Keystone so far. We are working on that together with Uhurutec though.

Version: With Rook, we are running 16.x with the plan to upgrade to 17.x soon-ish, though we are blocked there for non-Ceph and non-Rook reasons.

Hardware: Varying and historically grown, I'd have to look that up. Hit me up via email if you need that information: mailto:[email protected].

Features: We use RBD exclusively with Rook so far (see above), we intend to enable S3 and Swift frontends once we implemented support for that (currently, these needs are served by our old Ceph-Chef cluster). We use CephFS in non-bare-metal cases, too.

@brueggemann brueggemann added IaaS Issues or pull requests relevant for Team1: IaaS SCS-VP03 Related to tender lot SCS-VP03 labels Jan 10, 2024
@berendt
Copy link
Member

berendt commented Jan 11, 2024

We use the Quincy release (17.2.6) provided by OSISM 6.0.2 everywhere

We have a single small hyperconverged cluster for a specific customer workload. Otherwise we only use dedicated Ceph clusters. We currently have a single cluster that provides HDD and NVMe SSD as RBDs for Cinder/Nova. In addition, we have a cluster that is used exclusively for RGW and is offered as a Swift and S3 endpoint (integrated in Keystone and Glance, in future also Cinder (for backups)).

At the moment we are deploying the control plane on the Ceph OSD nodes and do not have any dedicated nodes for the control plane. We also do not split the data plane and control plane on the network side. We currently have 2x 100G in the Ceph nodes there. The compute nodes have 2x 25G (will be 2x 100G in the future as well). Latencies between the nodes are approx. 0.05ms (ICMP).

We have a separate pool for each OpenStack service (images, vms, volumes). We have several pools for Cinder so that we can partially separate customers.

We use the following services: osd, mon, mgr, rgw, crash. We would also like to take a look at mds in the future in order to be able to offer CephFS via Manila if necessary.

We can share details about hardware and the configuration in full if required.

We do not optimize the systems directly with the Ceph-Ansible part of OSISM, but use the tuned, sysctl and network roles from OSISM for this.

We are satisfied with what we can currently do with OSISM. We would only need more functionality in Day2 Operations in the future.

We have also recently added the option of deploying Kubernetes directly on all nodes in OSISM. We are open for Rook and cephadm. We are currently tending towards Rook as we believe it is the more consistent step.

@fkr
Copy link
Member

fkr commented Jan 22, 2024

@flyersa Can you give feedback as well? I think, it would be helpful.

@Nils98Ar
Copy link
Member

Nils98Ar commented Jan 24, 2024

  • We deploy Ceph Quincy completely with OSISM, replica count=3 and osds_per_device=2. Balancer is currently configured to warn and we adjust the pg_count if there's a warning.
  • Nodes: 5 storage nodes (osd, mds, rgw, crash) and 3 control nodes (mon, mgr, crash). The control nodes are also OpenStack control nodes.
  • Pools: we have only the default pools for .mgr, volumes, images, metrics, vms, cephfs (data and metadata) and rgw (multiple)
  • Services: rbd for OpenStack cinder (and currently also nova), cephFS for OpenStack manila and rgw for OpenStack S3 like this: https://osism.github.io/docs/guides/configuration-guide/ceph#rgw-service.
  • CPUs and Memory:
    - 1x10 core Intel CPUs + 64GB memory in the control nodes
    - 2x12 core Intel CPUs + 128GB memory in the storage nodes
  • Disks/Controllers:
    Control nodes/storage nodes OS each: SSD SATA 240GB (on the control nodes RAID 1 with controller)
    Storage nodes Ceph each: 2x SSD NVMe 1,6TB and 10x SSD SAS 3,2TB
  • We are not splitting "OSD setup" and "BlueStore WAL+DB" as Ceph should do this automatically in our understanding.
  • Interfaces: 1G interfaces for console and two 10G balance-xor bonds for Ceph frontend/backend (separate).
  • We would like to deploy CephFS ha active-active, which seems to be possible at least with cephadm

Maybe we will switch to a hyper-converged setup (compute/storage/maybe network) and 25G interfaces in the future.

@frosty-geek
Copy link
Member

frosty-geek commented Jan 25, 2024

Which ceph release are you running?

  • Quincy 17.2.6 → 42on note: recommends to latest quincy 17.2.7

What is the size of your ceph cluster?

  • ~0.5 PB cluster

Are Ceph workloads sharing the hardware with other workloads (hyperconverged)? If yes, why?

  • yes we're hyper converged, because of the business case and the way we started.

Are you running multiple pools or even multiple clusters? If yes, why?

  • yes → VMs (nova disk), RGW, Images (glance), Volumes (cinder), Backup...
  • we're not running multiple clusters, we have 1 cluster per "region" (scs1, prod1, prod2, prod3...)

Which ceph features/daemons are you using and how are they integrated into OpenStack and/or other services?

  • RadosGW → Ceph Object Gateway Swift API
  • Rados Block Devices for Nova/Cinder/Glance

Wishlist:

  • CephFS → k8s RWX → needs multi tenancy
  • Manila with CephFS Backend (ganesha) → check ceph reef release with active/active backend

Which hardware are you using (either sizing or specs)?
CPU/RAM

  • CPU/RAM not being reserved/dedicated assigned atm but that's planned

HDDs/SSDs/NVMEs(/Controllers)

  • SSD only/HBAs → We don't split Metadata + Data and for now not considering it

Are you splitting "OSD setup" and "BlueStore WAL+DB"
→ 3x Controller Nodes running Ceph Management Components (Mons, MGRs, RGWs...), Hypervisors running only OSDs

NICs/speed/latency

  • 2x active/passive 25 GBit NIC (fibre) → 0.09 - 0.2 ms → VLAN+NPAR
    → past different physical NICs for Frontend (Controlplane)+ Backend (Dataplane), moving back to 1 physical device with NPAR/QoS

Which Ceph config is deployed by OSISM?
Do you mind sharing the actual config/yaml?
Default config shipped with the reference implementation

  • OSISM default config with additions to rack aware crush rule, increase PGs dramatically (as advised by 42on), added quotas for RGW, enabled dashboard + admin user, added auth_allow_insecure_global_id_reclaim, OSD memory target

Which Ceph config is deployed "unknown to" or "on top of" OSISM"? e.g. special crush maps, special configs

  • see above

Would it be nice to have more Ceph features deployable via OSISM?

  • see above wishlist
    → dashboard enablement + user for it
    → get rid of ceph credentials/secrets from config git repo

What is your justified opinion on a new deployment method for Ceph (instead of ceph-ansible)?
What about Cephadm?
Are you maybe already using it in your current cluster (deployed by hand)?
What about Rook?
Are you maybe already using it on top of a k8s deployed in OpenStack?

  • we plan to use cephadm in the future, depending on the future use of k3s being able to also run on the hypervisors we may consider rook

@flyersa
Copy link

flyersa commented Jan 25, 2024

Which ceph release are you running?

  • Pacific and Reef

What is the size of your ceph cluster?

  • 3PB

Are Ceph workloads sharing the hardware with other workloads (hyperconverged)? If yes, why?

  • No

Are you running multiple pools or even multiple clusters? If yes, why?

  • different pools for different storage class such as magnetic, ssd, nvme

Which ceph features/daemons are you using and how are they integrated into OpenStack and/or other services?

  • mainly rbd and radosgw with swift support

Which hardware are you using (either sizing or specs)?

Mainly HPE such as Apollo 4200 or similar

Are you splitting "OSD setup" and "BlueStore WAL+DB"

of course

NICs/speed/latency

2x 40GIG or 4x 10Gig, depending on scenario and potential troughput

Are you using splitting Dataplane and Controlplane?

no, monitors and mgr usually go on the storage nodes

Which Ceph config is deployed by OSISM?

none, we never deploy with osism and use cephadm. We had customer faults based on user error in the past already damaging ceph clusters, thus we focus on a strong seperation from storage and openstack.

Would it be nice to have more Ceph features deployable via OSISM?

for others maybe, as i said that does not belong into the same system that i manage my compute resources for various operational topics

What is your justified opinion on a new deployment method for Ceph (instead of ceph-ansible)?

We should use what is used upstream, for ceph the tool is now cephadm, so of course we should use this

What about Rook?

While rook adds alot in regards to fault tolerance and so on it adds complexity too, i am not a huge fan of rook, in a CSP environment usually you have dedicated servers (if not HCI) for storage, no need to add a k8s cluster on top of it...

@yeoldegrove
Copy link

Our current decision tracking is done here: https://input.scs.community/3aZ-xdnRS-y11lZkrtAvxw

@flyersa
Copy link

flyersa commented Feb 6, 2024

btw. another point to freaking get rid of ceph-ansible... Ever did an upgrade? In the time this crap takes alone to upgrade a single monitor i upgrade complete datacenters to a new ceph version with cephadm...

@fkr fkr added epic Issues that are spread across multiple sprints and removed epic Issues that are spread across multiple sprints labels Feb 20, 2024
@yeoldegrove yeoldegrove changed the title New deployment and day-2-ops tooling software defined storage (ceph) New deployment and day-2-ops tooling software defined storage (ceph) - ADR Feb 28, 2024
@b1-lender b1-lender self-assigned this Sep 3, 2024
@b1-lender b1-lender linked a pull request Sep 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IaaS Issues or pull requests relevant for Team1: IaaS SCS-VP03 Related to tender lot SCS-VP03
Projects
Status: Doing
Status: In Progress
9 participants