Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] S3: support SSE-C #5262

Open
7 tasks
chibenwa opened this issue Aug 30, 2024 · 16 comments
Open
7 tasks

[EPIC] S3: support SSE-C #5262

chibenwa opened this issue Aug 30, 2024 · 16 comments

Comments

@chibenwa
Copy link
Member

Why

Protect data in the object store: if data is read then no data leak appears thanks to encryption.

Today James applies himself AES for a single private key.

Limitation:

  • performance: Java code... Would be better to get the cloud provider do it for us.
  • security: we do not apply key derivation principles. Breacking encryption for one itm mean breacking it for all items.
  • Non standard.
  • performance: limits of the tink crypto library forces us to use blocking abstraction and forces us to rely on buffering as the input size mismatch.

Proposal

Enabling SSE-C

Allow to use https://help.ovhcloud.com/csm/en-public-cloud-storage-s3-encrypt-objects-sse-c?id=kb_article_view&sysparm_article=KB0047314 (client provided keys)

(Meaning we can either do AES james side OR do SSE-C OR do nothing)

Configuration blob.properties:

encryption.s3.sse.c.enable=true
encryption.s3.sse.c.master.key.algorithm=AES256
encryption.s3.sse.c.master.key.salt=XXXXXXXXXX
encryption.s3.sse.c.master.key.password=XXXXXXXXXX

Within blob-s3 maven project, reuse dependecy blob-aes and use PBKDF2StreamingAeadFactory to derive the new key.

Integration with the S3 driver is straight forward:

Screenshot from 2024-08-30 11-19-42

Definition of done:

  • Failure for partial configuration eg no salt/algo/password
  • Unit tests for S3BlobStoreDAP in SSE mode
  • A signle integration test for SSE mode
  • Documentation
  • Performance tests on top of OVH S3
  • See if we can decript blob-eas data with sse-c (are the two inter-operable? Likely not but worth trying!
  • Try it also with SSE-C in compulsory mode
{
    "Version": "2012-10-17",
    "Id": "PutObjectPolicy",
    "Statement": [
        {
            "Sid": "RestrictSSECObjectUploads",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*",
            "Condition": {
                "Null": {
                    "s3:x-amz-server-side-encryption-customer-algorithm": "false"
                }
            }
        }
    ]
}	

(and document the put policy as well!)

Option for key derivation

Allows using an encryption key per blob (derived from the blobId) hence making each encryption key unique: if I break one key, I decrypt only one object, not all.

Configuration:

encryption.s3.sse.c.key.derivation.enabled=true

Investigate base way to derivate the key:

Key derivation will be performed on each blob so this operation NEEDS to be cheap.

Notes

Preliminary to a stronger isolation policy where tenant get encoded in the key derivation salt. This would be handy for Postgres multi-tenancy.

@vttranlina
Copy link
Member

About key derivation

Some code from Chat GPT (not test)

import javax.crypto.SecretKeyFactory;
import javax.crypto.spec.PBEKeySpec;
import java.security.SecureRandom;
import java.util.Base64;

public class KeyDerivationUtil {

    // Derive a key from the master key and blobId
    public static String deriveKey(String masterKey, String blobId) throws Exception {
        char[] masterKeyChars = masterKey.toCharArray();
        byte[] salt = blobId.getBytes(); // BlobId is used as the salt

        // Using PBKDF2 with HMAC-SHA-256
        PBEKeySpec spec = new PBEKeySpec(masterKeyChars, salt, 65536, 256); // 65536 iterations, 256-bit key
        SecretKeyFactory skf = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA256");
        byte[] derivedKey = skf.generateSecret(spec).getEncoded();

        // Encode the derived key to Base64 for use in requests
        return Base64.getEncoder().encodeToString(derivedKey);
    }
}

then
upload:

 PutObjectRequest putRequest = PutObjectRequest.builder()
                .bucket(bucketName)
                .key(objectKey)
                .serverSideEncryptionCustomerAlgorithm("AES256")  // Use AES256 for SSE-C
                .serverSideEncryptionCustomerKey(derivedKey)      // Provide the derived key for encryption
                .build();

download:

 GetObjectRequest getRequest = GetObjectRequest.builder()
                .bucket(bucketName)
                .key(objectKey)
                .serverSideEncryptionCustomerAlgorithm("AES256")  // Use AES256 for SSE-C
                .serverSideEncryptionCustomerKey(derivedKey)      // Provide the derived key for decryption
                .build();

@vttranlina
Copy link
Member

Note: blobId is not a only input for salt
It can be combine: domain (tenant) + blobId
when multi-tenancy.mode=ssec #5263

@chibenwa
Copy link
Member Author

Sure using the blobId as salt within PBKDF2WithHmacSHA256 is gold but what's the cost of it?

Ideally we shall have JMH micro benchmarks on key derivation, and in terms of perfs it shall be well inferior to ms

Note: blobId is not a only input for salt
It can be combine: domain (tenant) + blobId
when multi-tenancy.mode=ssec #5263

Yep yep yep!

@Arsnael
Copy link
Member

Arsnael commented Sep 12, 2024

Confusion in the team regarding encryption.s3.sse.c.master.key.salt=XXXXXXXXXX

Different opinions here. Is it used all the time?

Some of us think if key derivation disabled not used but only when key derivation enabled (salt + blobId for example)

but others think it is used : salt for all blobs if no key derivation, salt + blobId (or sth else) if enabled.

What's the correct way here?

@chibenwa
Copy link
Member Author

but others think it is used : salt for all blobs if no key derivation, salt + blobId (or sth else) if enabled.

IMO that one.

But i think we shall experiment with key derivation further... (JMH benchmarks...)

Ideally:

  • Master key formed with salt and PBKDF2WithHmacSHA256
  • And (optional) key derivative using domain and blobId (actually S3 object key!)

IMO experiment a bit and see what can be done on this topic...

@Arsnael
Copy link
Member

Arsnael commented Sep 17, 2024

@vttranlina
Copy link
Member

When I tried to lab SSEC with S3Minio,
I got an error:

software.amazon.awssdk.services.s3.model.S3Exception: Requests specifying Server Side Encryption with Customer provided keys must be made over a secure connection. 

It require "SSL" connection,

@quantranhong1999
Copy link
Member

It require "SSL" connection,

Does our S3 client support SSL mode?

@Arsnael
Copy link
Member

Arsnael commented Sep 24, 2024

You really think ovh exposes their S3 endpoints in http and not https? Come on guys... Check again all the s3 endpoints in our deployments and you will see everything is in https

@vttranlina
Copy link
Member

I know ssl is used for prod,
My mean raise stuck with test container (S3minio container)

@Arsnael
Copy link
Member

Arsnael commented Sep 24, 2024

Try to add a nginx locally maybe for ssl?

EDIT: or I'm sure we should be able to setup ssl for minio

@hungphan227
Copy link

I know ssl is used for prod, My mean raise stuck with test container (S3minio container)

If ssl is required, then setup ssl for it

@hungphan227
Copy link

https://min.io/docs/minio/linux/operations/network-encryption.html

@chibenwa
Copy link
Member Author

https://github.com/linagora/socat-client-server to easily to the job...

@vttranlina
Copy link
Member

Sure using the blobId as salt within PBKDF2WithHmacSHA256 is gold but what's the cost of it?

Bellow is my Benchmark test the generate deriveKey by using jmh-core

ALGORITHM: PBKDF2WithHmacSHA256
ITERATION_COUNT: 65536
KEY_LENGTH: 256

Benchmark                          Mode  Cnt    Score   Error  Units
DeriveKeyBenchmark.testDeriveKey  thrpt    5  103.055 ± 0.710  ops/s
  1. Decrease ITERATION_COUNT 65536 -> 10000
    The ops/s is significantly better.
ALGORITHM: PBKDF2WithHmacSHA256
ITERATION_COUNT: 10000
KEY_LENGTH: 256

Benchmark                          Mode  Cnt    Score    Error  Units
DeriveKeyBenchmark.testDeriveKey  thrpt    5  671.076 ± 14.413  ops/s
  1. Decrease KEY_LENGTH 256->128
    The ops/s has not changed significantly.
ALGORITHM: PBKDF2WithHmacSHA256
ITERATION_COUNT: 65536
KEY_LENGTH: 128
Benchmark                          Mode  Cnt    Score   Error  Units
DeriveKeyBenchmark.testDeriveKey  thrpt    5  102.869 ± 0.274  ops/s
  1. Change ALGORITHM PBKDF2WithHmacSHA256 -> PBKDF2WithHmacSHA512
    the ops/s is significantly slower.
ALGORITHM: PBKDF2WithHmacSHA512
ITERATION_COUNT: 65536
KEY_LENGTH: 256

Benchmark                          Mode  Cnt   Score   Error  Units
DeriveKeyBenchmark.testDeriveKey  thrpt    5  27.564 ± 0.292  ops/s

// Code lab: https://gist.github.com/vttranlina/c856f0f1ef5ac8c66708396cd7754f19

One sample result
DeriveKeyBenchmark_PBKDF2WithHmacSHA256_65536_256.txt

@chibenwa
Copy link
Member Author

Thanks for putting the JMH benchmarks together.

I discussed with Xavier and we think that derivation with PBKDF2WithHmacSHA256 is ok IF we use a cache (Guava loading cache) so that AES key is computed only once per domain.

Commenting on the related ticket BTW...

@Arsnael Arsnael removed the grooming label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants