Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Availability Zone Standard #640

Merged
merged 36 commits into from
Oct 14, 2024
Merged

Conversation

josephineSei
Copy link
Contributor

closes #539

@josephineSei josephineSei changed the title Create scs-XXXX-vN-Availability-Zones-Standard.md Draft: scs-XXXX-vN-Availability-Zones-Standard.md Jun 17, 2024
@josephineSei josephineSei marked this pull request as draft June 19, 2024 13:42
@josephineSei josephineSei marked this pull request as ready for review June 24, 2024 10:15
@josephineSei josephineSei changed the title Draft: scs-XXXX-vN-Availability-Zones-Standard.md Create Availability Zone Standard Jun 24, 2024
Copy link
Contributor

@artificial-intelligence artificial-intelligence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first round of comments, I'll need to get back to this later.
Notice there are still some spelling mistakes, which I didn't have the time to address one by one just yet.

Thanks for all the effort put into this!

Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Co-authored-by: Sven <[email protected]>
Signed-off-by: josephineSei <[email protected]>
Copy link
Contributor Author

@josephineSei josephineSei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to put in all information I got from CSPs, while keeping the focus on Availability Zones.

This keeps much of the physical redundancy part out, but if this should also be part of it, we need to discuss to what extent or if it may be better to refer to some document outside of this standard.

Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
@josephineSei
Copy link
Contributor Author

In todays IaaS call, we discussed a few open questions:

Network AZ

In the standard I discussed, that it is possible to have Network AZ, but this has downsides for users. Thus i did not make any recommendations. We discussed, whether we even want to discourage CSPs to use it ("SHOULD NOT"):

  • it has been brought up that it is hard to configure and not nice to use for users
  • @garloff: discourage or even forbid usage of network AZs
  • @berendt: should not be forbidden, there are use cases
  • These are really not nice for users, we should discourage it (but not disallow)
    • ToDo: Ask for more use cases, maybe we can not even discourage

Cross-Attach AZ

Question was, whether we want to encourgage / allow / discourage or disallow this?

  • so far, nearly no CSP uses this according to Hedgedoc input
  • @garloff: unlike for network it is not obvious that I can attach volumes from other AZs
  • when using Ceph, you'd normally have a global cross-AZ for storage (but not several storage AZs)
  • if not using Ceph, implementation would be hard, we should not request this from CSPs
    • Use-case wavecon: Local dedicated (per AZ) ceph clusters, no support for x-attaching
  • @artificial-intelligence: X-attach would negatively impact isolation between AZs (and performance)
  • Maybe transparency is the most important feature here?
  • important to distinguish between replicating storage between AZs vs. cross-attaching volumes across AZs

Overall

  • We can not define all kinds of details how DCs should be built for highest availability
  • Reference DC taxonomies / BSI taxonomy for this
  • SCS can be useful by providing some minimal bounds that allows uses to have meaningfully higher chance to survive by spreading over several AZs
  • Highest level of redundancy will always be achieved by replicating data over several regions
    • Can we define something with "AZ"s that's better than nothing (though never as good as regions)?

@josephineSei
Copy link
Contributor Author

I send a mail to the ML asking for feedback on the network AZ topic.

@horazont
Copy link
Member

Single network AZ is not a problem for us. Neutron's HA capabilities are strong enough and our networks are small enough that we wouldn't gain anything from separate AZs.

@josephineSei
Copy link
Contributor Author

I read through the standard after my vacation and looked through the IaaS call protocols, that happened in the mean time. I think we still need feedback from CSPs, so I wrote a Mail to the scs ml.

Copy link
Contributor

@markus-hentsch markus-hentsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments and suggestions mostly revolving around spelling and phrasing.

I did notice there is a mix of capitalization for some terms: often "Storage" and "Compute" are capitalized (not everywhere 100% though) whereas "network" in the network AZ section is not while it is in others. I think this could also be aligned a bit better over the whole document.

Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
There might only be a loss of a few packages within the los network ressources.

With having Compute and Storage in a good state (e.g. through having fire zones with a compute AZ each and storage being replicated over the fire zones) it would not have downsides to not have Availability Zones for the network service.
It might even be the opposite: Having resources running in certain Availability Zones might permit them from being scheduled in other AZs[^3].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

permit [...] from

Did you mean "prevent [...] from" here?

To be honest, I can't really tell as I haven't fully understood why there are no downsides from omitting AZs in network from this whole paragraph.
Maybe one or two details could be added to explain the reasoning behind this general statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few more lines to this paragraph to explain this better. and yes it was "prevent".

Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Copy link
Contributor

@gtema gtema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I am fine but agree with one comment on rephrasing or dropping one statement

Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
Standards/scs-XXXX-vN-Availability-Zones-Standard.md Outdated Show resolved Hide resolved
@frosty-geek
Copy link
Member

FTR, plusserver's definition on AZ https://docs.plusserver.com/en/general/plusserver-region-az/


## Physical Audits

In cases where it is reasonable to mistrust the provided documentation, a physical audit by a natural person - called auditor - send by the OSBA (?) should be performed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garloff When we want to have someone auditing deployments in special cases we need to define, who will name such a person. Will that be the OSBA?

@josephineSei
Copy link
Contributor Author

@artificial-intelligence and @markus-hentsch we've got feedback from CSPs and I added a note for manual testing. Could you check, if all your comments are addressed now?

josephineSei and others added 2 commits September 25, 2024 16:22
Co-authored-by: Markus Hentsch <[email protected]>
Signed-off-by: josephineSei <[email protected]>
@josephineSei
Copy link
Contributor Author

@artificial-intelligence can you please check, whether all your comments have been addressed?

@josephineSei josephineSei merged commit 2be5604 into main Oct 14, 2024
9 checks passed
@josephineSei josephineSei deleted the availability-zones-standard branch October 14, 2024 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Availability Zones: standardized levels of independecies.
10 participants