Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to take a snapshot of a managed service source cluster #1028

Conversation

mikaylathompson
Copy link
Collaborator

Description

We discovered that our existing CDK is not sufficient for taking a snapshot of a managed service source cluster. A specific snapshot role must be created, and there needs to be a trust relationship between the console task role and this snapshot role that allows it to be passed.

With this PR, if managedServiceSourceSnapshotEnabled is added to the cdk.context as true, it sets up these roles and relationships.

In this PR, it does not handle adding the relevant role arn to the services.yaml and passing it through automatically. That would be an excellent follow-up, but in this case I'm trying to focus on unblocking the particularly time-confusing and messy parts instead of making this a fully supported experience.

One of the caveats is that, to pass in the snapshot role, the source cluster must use sigv4 auth. I added a check for this in the CDK.

Manually tested:

Before:

(.venv) bash-5.2# console snapshot create
2024-09-28 03:23:13,956 INFO o.o.m.u.ProcessHelpers [main] getNodeInstanceName()=ip-10-0-5-198.us-east-2.compute.internal
2024-09-28 03:23:14,679 INFO o.o.m.CreateSnapshot [main] Running CreateSnapshot with --snapshot-name rfs-snapshot --source-host https://vpc-cisco-or1-j5nojtdsafdmocqgnzmpjpcbwe.us-east-2.es.amazonaws.com --source-username admin --source-password Admin123! --otel-collector-endpoint http://localhost:4317 --s3-repo-uri s3://migration-artifacts-293444901541-new-cdk-us-east-2/rfs-snapshot-repo --s3-region us-east-2 --no-wait
2024-09-28 03:23:15,144 INFO o.o.m.b.w.SnapshotRunner [main] Attempting to initiate the snapshot...
2024-09-28 03:23:16,573 ERROR o.o.m.b.c.OpenSearchClient [reactor-http-nio-2] Could not register snapshot repo: _snapshot/migration_assistant_repo. Response Code: 401, Response Message: Unauthorized, Response Body: {"Message":"settings.role_arn is needed for snapshot registration."}

With this enabled:

(.venv) bash-5.2# console snapshot create --s3-role-arn arn:aws:iam::293444901541:role/OSMigrations-new-cdk-us-east-2-SnapshotRole53D7C789-18fLjvR4jHRc
2024-09-28 19:34:29,439 INFO o.o.m.u.ProcessHelpers [main] getNodeInstanceName()=ip-10-0-5-155.us-east-2.compute.internal
2024-09-28 19:34:30,381 INFO o.o.m.CreateSnapshot [main] Running CreateSnapshot with --snapshot-name rfs-snapshot --source-host https://vpc-demo-target-cluster-3v7rbwdjxvf6xaxmdxcswmrjgm.us-east-2.es.amazonaws.com --source-aws-service-signing-name es --source-aws-region us-east-2 --otel-collector-endpoint http://localhost:4317 --s3-repo-uri s3://migration-artifacts-293444901541-new-cdk-us-east-2/rfs-snapshot-repo --s3-region us-east-2 --no-wait --s3-role-arn arn:aws:iam::293444901541:role/OSMigrations-new-cdk-us-east-2-SnapshotRole53D7C789-18fLjvR4jHRc
2024-09-28 19:34:30,871 INFO o.o.m.b.w.SnapshotRunner [main] Attempting to initiate the snapshot...
2024-09-28 19:34:32,953 INFO o.o.m.b.c.SnapshotCreator [main] Snapshot repo registration successful
2024-09-28 19:34:33,040 INFO o.o.m.b.c.SnapshotCreator [main] Snapshot rfs-snapshot creation initiated
2024-09-28 19:34:33,040 INFO o.o.m.b.w.SnapshotRunner [main] Snapshot in progress...
Snapshot rfs-snapshot creation initiated successfully

Issues Resolved

#2001

I also fix a dumb mistake I made in the sample cdk.context.json and make the logic around targetVersion vs engineVersion cleaner. It was still requiring an engineVersion in some cases where it really wasn't relevant.

Testing

Manual.

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Sep 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.28%. Comparing base (00f7ce3) to head (17ad2c7).
Report is 6 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1028      +/-   ##
============================================
- Coverage     80.30%   80.28%   -0.02%     
- Complexity     2727     2728       +1     
============================================
  Files           366      366              
  Lines         13617    13617              
  Branches        942      942              
============================================
- Hits          10935    10933       -2     
- Misses         2108     2109       +1     
- Partials        574      575       +1     
Flag Coverage Δ
gradle-test 78.27% <ø> (-0.02%) ⬇️
python-test 90.11% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -187,6 +187,31 @@ export function createDefaultECSTaskRole(scope: Construct, serviceName: string):
return serviceTaskRole
}

export function createSnapshotOnAOSRole(scope: Construct, artifactS3Arn: string, migrationConsoleTaskRoleArn: string): Role {
const snapshotRole = new Role(scope, `SnapshotRole`, {
assumedBy: new ServicePrincipal('es.amazonaws.com'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: AOS doesn't support snapshots yet

export function createSnapshotOnAOSRole(scope: Construct, artifactS3Arn: string, migrationConsoleTaskRoleArn: string): Role {
const snapshotRole = new Role(scope, `SnapshotRole`, {
assumedBy: new ServicePrincipal('es.amazonaws.com'),
description: 'Role that grants OpenSearch Service permissions to access S3 to create snapshots',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a deterministic name to avoid suffix? OSMigrations-new-cdk-us-east-2-SnapshotRole53D7C789-18fLjvR4jHRc?

@@ -396,7 +421,7 @@ export function parseClusterDefinition(json: any): ClusterYaml {
}
const auth = parseAuth(json.auth)
if (!auth) {
throw new Error(`Invalid auth type when parsing cluster definition: ${json.auth.type}`)
throw new Error(`Invalid auth type when parsing cluster definition: ${json.auth}`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we hold on this change until the plaintext password option is removed from this?

@@ -324,6 +327,10 @@ export class MigrationConsoleStack extends MigrationServiceCore {
...props
});

if (props.managedServiceSourceSnapshotEnabled) {
const consoleServiceRoleName = "migration-console-TaskRole";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, not anymore, removing.

| reindexFromSnapshotServiceEnabled | boolean | true | Create resources for deploying and configuring the RFS ECS service |
| reindexFromSnapshotExtraArgs | string | "--target-aws-region us-east-1 --target-aws-service-signing-name es" | Extra arguments to provide to the Document Migration command with space separation. See [RFS Arguments](../../../DocumentsFromSnapshotMigration/README.md#Arguments). [^1] |
| sourceClusterEndpoint | string | `"https://source-cluster.elb.us-east-1.endpoint.com"` | The endpoint for the source cluster from which RFS will take a snapshot |
| managedServiceSourceSnapshotEnabled | boolean | true | Create the necessary roles and trust relationships to take a snapshot of a managed service source cluster. This is only compatible with SigV4 auth. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to make this a top level argument instead of within source cluster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this--it's a combo of RFS-related and source-cluster related, so it didn't fit perfectly anywhere. Would you prefer in the source object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this for now

Signed-off-by: Mikayla Thompson <[email protected]>
@@ -7,7 +7,7 @@
"type": "none | basic | sigv4",
"// basic auth documentation": "The next two lines are releavant for basic auth only",
"username": "<USERNAME>",
"password_from_secret_arn": "<ARN_OF_SECRET_CONTAINING_PASSWORD>",
"passwordFromSecretArn": "<ARN_OF_SECRET_CONTAINING_PASSWORD>",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, this could have really tripped up a customer

@mikaylathompson mikaylathompson merged commit 5db2d5b into opensearch-project:main Sep 30, 2024
14 checks passed
@mikaylathompson mikaylathompson deleted the snapshot-of-managed-service-source branch September 30, 2024 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants