S3 reliability: write accross 2 AZ #1166

chibenwa · 2024-08-27T07:07:43Z

Description

A customer rely wishes data loss never to happen again and is paranoid about it.

We wishes to offer them a Twake mail feature in order to write across 2 availability zones, synchronously.

(disclaimer: I personally advocate against this feature...)

Thus in case of failure:

We could still read emails (fallback to the available zone)
But we can no longer write, unless a manual reconfiguration is performed to the remaining AZ

Configuration changes

In blob.properties

objectstorage.s3.secondary.enabled=true
objectstorage.s3.secondary.endPoint=${env:TMAIL_S3_ENDPOINT}
objectstorage.s3.secondary.region=${env:TMAIL_S3_REGION}
objectstorage.s3.secondary.accessKeyId=${env:TMAIL_S3_ACCESS_KEY}
objectstorage.s3.secondary.secretKey=${env:TMAIL_S3_SECRET_KEY}

Plugged to a Tmail backend module chooser.

Code & location

maven module: tmail-baclend/blob/secondary-blob-store

Write a SecondaryBlobStoreDAO class that takes 2 blob store DAO

Write operations must complete on both in order to be considered successful (including deletes)
Write operation must be indempotent in face of partial failures (ie add+delete supports payload to be in A and not in B and successfully compete)
Read operation are performed in A, and fallback to B in case of error.

Plug this into the TMail blob module chooser

Definition of done:

Unit tests for SecondaryBlobStoreDAO
Distributed app integration tests: given correctly configured tmail bdistributed when I receive a mail, it gets stored in A and B
Video: local docker compose + 2 zenko server. I save mail and it gets objects in both zenko servers.
Helm chart configuration toogles (5.0, master and cnb branches) - ie ready to use for CNB preprod

The text was updated successfully, but these errors were encountered:

vttranlina · 2024-08-27T08:24:59Z

the personal view, I see it looks not good practice.
It should be responsibility of s3.

I recently received a claim from the Ops team regarding an S3 bucket.
@ducnm0711 do you have any better ideal?

Looks like S3 replication feature: https://aws.amazon.com/s3/features/replication/#:~:text=Amazon%20S3%20CRR%20automatically%20replicates,access%20in%20different%20geographic%20regions.

chibenwa · 2024-08-27T08:26:13Z

the personal view, I see it looks not good practice.
It should be responsibility of s3.

Agreed but it is not supported by OVH: we do not have a choice here.

tk-nguyen · 2024-08-27T08:28:06Z

It just got added: ovh/public-cloud-roadmap#179 (comment)
Docs: https://help.ovhcloud.com/csm/asia-public-cloud-storage-s3-asynchronous-replication-buckets?id=kb_article_view&sysparm_article=KB0062424

quantranhong1999 · 2024-08-27T08:31:58Z

It just got added: ovh/public-cloud-roadmap#179 (comment)

I have just seen that too. Likely we can publicly use that now.

tk-nguyen · 2024-08-27T09:16:46Z

Just tested, seem to work OK. I'm following this tutorial: https://help.ovhcloud.com/csm/asia-public-cloud-storage-s3-asynchronous-replication-buckets?id=kb_article_view&sysparm_article=KB0062424#using-the-cli

Note: only work on objects uploaded after applying the replication rule. See https://help.ovhcloud.com/csm/asia-public-cloud-storage-s3-asynchronous-replication-buckets?id=kb_article_view&sysparm_article=KB0062424#what-is-replicated-and-what-is-not

chibenwa · 2024-08-27T09:50:28Z

Let's validate if S3 asynchronous replication is acceptable by the customer first.

chibenwa · 2024-09-06T13:32:50Z

Edit: @PatrickPereiraLinagora will further cehck with customer if async replication is acceptable to them.

TL;DR following march incident our margin for maneuver are not great. We might be forced to swallow our hats, but at least we will try!

ALSO it turns out I badly understood the ticket, and we would also want to maintain automatic write availability.

Namely:

Nominal case

GIVEN we parallely write to blobStoreA and blobStoreB
WHEN both operation succeeds
THEN we return a storage success

Partial failure

GIVEN we parallely write to blobStoreA and blobStoreB
WHEN write on blobStoreA succeeds and write on blobStoreB fails (or the reverse)
THEN we publish a message on RabbitMQ to retry the write operation later
AND the write succeeds

This means we need to set up a RabbitMQ queue to retry failed writes. The listener of the queue would then asynchronously read blobStoreA to complete the write on blobStoreB.

Total failure

GIVEN we parallely write to blobStoreA and blobStoreB
WHEN write on blobStoreA fails and write on blobStoreB fails
THEN the write fails
AND no message is published on RabbitMQ

Read path

Read operation are performed in A, and fallback to B in case of error**, or if the object is not found in A**.

Arsnael · 2024-09-19T07:41:37Z

write blobstore without partial error handling RENE
- implement partial error handling RENE
update and deploy helm charts QUAN

TODO: write tickets

Arsnael · 2024-09-19T07:47:38Z

Remark: can plug it to blob module chooser to just encrypt with aes once for both s3 blobstores

hungphan227 · 2024-09-24T09:35:57Z

Should we have a cron job to ensure the consistency of 2 AZ?

vttranlina · 2024-09-24T09:47:15Z

Should we have a cron job to ensure the consistency of 2 AZ?

cron job for trigger what?//

ah, cron job for trigger webadmin, rerun task from deadletter queue

hungphan227 · 2024-09-24T09:53:24Z

Should we have a cron job to ensure the consistency of 2 AZ?

cron job for trigger what?

maybe checking any mismatch between 2 AZ or executing event dead letter, in case retry fail

vttranlina · 2024-09-24T10:00:48Z

maybe checking any mismatch between 2 AZ or executing event dead letter, in case retry fail

as my understand, we always trust objectstorage.s3.primary
When write blobStoreA fail -> total fail (even save to blobStoreB success)

hungphan227 · 2024-09-24T10:07:34Z

"When write blobStoreA fail -> total fail (even save to blobStoreB success)" -----------> isn't this partial failure?

Arsnael · 2024-09-24T10:20:04Z

When write blobStoreA fail -> total fail (even save to blobStoreB success)

No. The reverse is true too, read again #1166 (comment)

vttranlina · 2024-09-24T10:27:24Z

When write blobStoreA fail -> total fail (even save to blobStoreB success)

No. The reverse is true too, read again #1166 (comment)

I tried to read it again, but do not see what wrong
We have 2 blobStore:

blobStoreA : primary -> responsibility for all current mailbox logic
blobStoreB: second -> just for backup

Benoit give 3 examples, it does not contain case A fail, B success, but I think it nearly to case Total failure

Where did I go wrong?

Arsnael · 2024-09-24T10:29:28Z

From Benoit:

Partial failure

GIVEN we parallely write to blobStoreA and blobStoreB
WHEN write on blobStoreA succeeds and write on blobStoreB fails (or the reverse)
THEN we publish a message on RabbitMQ to retry the write operation later
AND the write succeeds

This means we need to set up a RabbitMQ queue to retry failed writes. The listener of the queue would then asynchronously read blobStoreA to complete the write on blobStoreB.

I think the step
WHEN write on blobStoreA succeeds and write on blobStoreB fails (or the reverse)
is clear :)

Arsnael · 2024-09-24T10:30:40Z

The reverse meaning:
WHEN write on blobStoreA fails and write on blobStoreB succeeds

vttranlina · 2024-09-24T10:43:23Z

I see "(or the reverse)"
With this logic, we have 2 "primary" blobStore.
With this logic, common issues may arise, such as misunderstanding it as adding new data or deleting when the data on both sides is out of sync...and the event ordering problem...blabla

I propose: WHEN write on blobStoreA fails and write on blobStoreB succeeds -> TOTAL FAIL

WDYT?

Arsnael · 2024-09-24T10:46:50Z

Isn't it the job of the rabbitmq queue to retry the failed item on one of the blobstore? Read blobStoreA for getting blob. Missing? Means then read on blobstoreB and write on A.

Not sure about your concern here

Arsnael · 2024-09-26T04:24:37Z

@chibenwa thoughts on @vttranlina concern above?

chibenwa · 2024-09-26T04:46:26Z

We are dealing with immutable data. Not a concern as long as we rely on RabbitMQ for resiliency.

We would only get residual data on failure anyway, the way we get them with out current architecture anyway.

Not a concern

Though I will be short on time to provide you a formal demonstration.

chibenwa · 2024-10-30T08:40:52Z

Need deployment at least deployed on CNB preprod for me to concider this done!

Arsnael · 2024-10-30T08:46:39Z

Sorry my bad

chibenwa added enhancement New feature or request customer labels Aug 27, 2024

Arsnael closed this as completed Oct 30, 2024

chibenwa reopened this Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 reliability: write accross 2 AZ #1166

S3 reliability: write accross 2 AZ #1166

chibenwa commented Aug 27, 2024 •

edited

Loading

vttranlina commented Aug 27, 2024

chibenwa commented Aug 27, 2024

tk-nguyen commented Aug 27, 2024

quantranhong1999 commented Aug 27, 2024

tk-nguyen commented Aug 27, 2024 •

edited

Loading

chibenwa commented Aug 27, 2024

chibenwa commented Sep 6, 2024

Arsnael commented Sep 19, 2024 •

edited by quantranhong1999

Loading

Arsnael commented Sep 19, 2024

hungphan227 commented Sep 24, 2024

vttranlina commented Sep 24, 2024 •

edited

Loading

hungphan227 commented Sep 24, 2024

vttranlina commented Sep 24, 2024

hungphan227 commented Sep 24, 2024 •

edited

Loading

Arsnael commented Sep 24, 2024

vttranlina commented Sep 24, 2024

Arsnael commented Sep 24, 2024

Arsnael commented Sep 24, 2024

vttranlina commented Sep 24, 2024

Arsnael commented Sep 24, 2024

Arsnael commented Sep 26, 2024

chibenwa commented Sep 26, 2024

chibenwa commented Oct 30, 2024

Arsnael commented Oct 30, 2024

S3 reliability: write accross 2 AZ #1166

S3 reliability: write accross 2 AZ #1166

Comments

chibenwa commented Aug 27, 2024 • edited Loading

Description

Configuration changes

Code & location

Definition of done:

vttranlina commented Aug 27, 2024

chibenwa commented Aug 27, 2024

tk-nguyen commented Aug 27, 2024

quantranhong1999 commented Aug 27, 2024

tk-nguyen commented Aug 27, 2024 • edited Loading

chibenwa commented Aug 27, 2024

chibenwa commented Sep 6, 2024

Nominal case

Partial failure

Total failure

Read path

Arsnael commented Sep 19, 2024 • edited by quantranhong1999 Loading

Arsnael commented Sep 19, 2024

hungphan227 commented Sep 24, 2024

vttranlina commented Sep 24, 2024 • edited Loading

hungphan227 commented Sep 24, 2024

vttranlina commented Sep 24, 2024

hungphan227 commented Sep 24, 2024 • edited Loading

Arsnael commented Sep 24, 2024

vttranlina commented Sep 24, 2024

Arsnael commented Sep 24, 2024

Partial failure

Arsnael commented Sep 24, 2024

vttranlina commented Sep 24, 2024

Arsnael commented Sep 24, 2024

Arsnael commented Sep 26, 2024

chibenwa commented Sep 26, 2024

chibenwa commented Oct 30, 2024

Arsnael commented Oct 30, 2024

chibenwa commented Aug 27, 2024 •

edited

Loading

tk-nguyen commented Aug 27, 2024 •

edited

Loading

Arsnael commented Sep 19, 2024 •

edited by quantranhong1999

Loading

vttranlina commented Sep 24, 2024 •

edited

Loading

hungphan227 commented Sep 24, 2024 •

edited

Loading