diff --git a/_migrations/getting-started-data-migration.md b/_migrations/getting-started-data-migration.md index 8ae1a7f457..b788522d5a 100644 --- a/_migrations/getting-started-data-migration.md +++ b/_migrations/getting-started-data-migration.md @@ -181,7 +181,7 @@ To deploy Migration Assistant, use the following steps: These commands deploy the following stacks: * Migration Assistant network stack -* Reindex From Snapshot stack +* `Reindex-from-snapshot` stack * Migration console stack --- @@ -253,7 +253,7 @@ Run the following command to migrate metadata: console metadata migrate [...] ``` -For more information, see [Metadata migration]. +For more information, see [Migrating metadata]({{site.url}}{{site.baseurl}}/migrations/migration-phases/migrating-metadata/). --- @@ -285,7 +285,7 @@ You can now use RFS to migrate documents from your original cluster: console backfill stop ``` -For more information, see [Backfill execution]. +For more information, see [Backfill]({{site.url}}{{site.baseurl}}/migrations/migration-phases/backfill/). --- @@ -328,4 +328,3 @@ fields @message If any failed documents are identified, you can index the failed documents directly as opposed to using RFS. -For more information, see [Backfill migration]. diff --git a/_migrations/migration-console/accessing-the-migration-console.md b/_migrations/migration-console/accessing-the-migration-console.md index 9c09515ab6..d6cf9ec150 100644 --- a/_migrations/migration-console/accessing-the-migration-console.md +++ b/_migrations/migration-console/accessing-the-migration-console.md @@ -1,12 +1,15 @@ +--- +layout: default +title: Accessing the migration console +nav_order: 35 +parent: Migration console +--- +# Accessing the migration console +The Bootstrap box deployed through Migration Assistant contains a script that simplifies access to the migration console through that instance. -The Migrations Assistant deployment includes an ECS task that hosts tools to run different phases of the migration and check the progress or results of the migration. - -## SSH into the Migration Console -Following the AWS Solutions deployment, the bootstrap box contains a script that simplifies access to the migration console through that instance. - -To access the Migration Console, use the following commands: +To access the migration console, use the following commands: ```shell export STAGE=dev @@ -16,13 +19,7 @@ export AWS_REGION=us-west-2 When opening the console a message will appear above the command prompt, `Welcome to the Migration Assistant Console`. -
- - -SSH from any machine into Migration Console - - -On a machine with the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) ↗ and the [AWS Session Manager Plugin](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html) ↗, you can directly connect to the migration console. Ensure you've run `aws configure` with credentials that have access to the environment. +On a machine with the [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and the [AWS Session Manager plugin](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html), you can directly connect to the migration console. Ensure that you've run `aws configure` with credentials that have access to the environment. Use the following commands: @@ -32,10 +29,6 @@ export SERVICE_NAME=migration-console export TASK_ARN=$(aws ecs list-tasks --cluster migration-${STAGE}-ecs-cluster --family "migration-${STAGE}-${SERVICE_NAME}" | jq --raw-output '.taskArns[0]') aws ecs execute-command --cluster "migration-${STAGE}-ecs-cluster" --task "${TASK_ARN}" --container "${SERVICE_NAME}" --interactive --command "/bin/bash" ``` -
- -## Troubleshooting -### Deployment Stage -Typically, `STAGE` is `dev`, but this may vary based on what the user specified during deployment. \ No newline at end of file +Typically, `STAGE` is equivalent to a standard `dev` environment, but this may vary based on what the user specified during deployment. \ No newline at end of file diff --git a/_migrations/migration-console/index.md b/_migrations/migration-console/index.md index 78a8011b57..7ebac65836 100644 --- a/_migrations/migration-console/index.md +++ b/_migrations/migration-console/index.md @@ -3,4 +3,9 @@ layout: default title: Migration console nav_order: 30 has_children: true ---- \ No newline at end of file +--- + +The Migrations Assistant deployment includes an Amazon Elastic Container Service (Amazon ECS) task that hosts tools that run different phases of the migration and check the progress or results of the migration. This ECS task is called the **migration console**. The migration console is a command line interface used to interact with the deployed components of the solution. + +This section provides information about how to access the migration console and what commands are supported. + diff --git a/_migrations/migration-console/migration-console-commands-references.md b/_migrations/migration-console/migration-console-commands-references.md index 8b906ff9b2..55731229e0 100644 --- a/_migrations/migration-console/migration-console-commands-references.md +++ b/_migrations/migration-console/migration-console-commands-references.md @@ -1,83 +1,106 @@ +--- +layout: default +title: Command reference +nav_order: 35 +parent: Migration console +--- +# Migration console command reference -The Migration Assistant Console is a command line interface to interact with the deployed components of the solution. +Migration console commands follow this syntax: `console [component] [action]`. The components include `clusters`, `backfill`, `snapshot`, `metadata`, and `replay`. The console is configured with a registry of the deployed services and the source and target cluster, generated from the `cdk.context.json` values. -The commands are in the form of `console [component] [action]`. The components include `clusters`, `backfill` (e.g the Reindex from Snapshot service), `snapshot`, `metadata`, `replay`, etc. The console is configured with a registry of the deployed services and the source and target cluster, generated from the `cdk.context.json` values. - -## Commonly Used Commands +## Commonly used commands The exact commands used will depend heavily on use-case and goals, but the following are a series of common commands with a quick description of what they do. +### Check connection + +Reports whether both the source and target clusters can be reached and provides their versions. + ```sh console clusters connection-check ``` -Reports whether both the source and target clusters can be reached and their versions. +### Run `cat-indices` + +Runs the `cat-indices` API on the cluster. ```sh console clusters cat-indices ``` -Runs the `_cat/indices` command on each cluster and prints the results. -*** +### Create a snapshot + +Creates a snapshot of the source cluster and stores it in a preconfigured Amazon Simple Storage Service (Amazon S3) bucket. ```sh console snapshot create ``` -Initiates creating a snapshot on the source cluster, into a pre-configured S3 bucket. + +## Check snapshot status + +Runs a detailed check on the snapshot creation status, including estimated completion time: ```sh console snapshot status --deep-check ``` -Runs a detailed check on the status of the snapshot creation, including estimated completion time. -*** +## Evaluate metadata + +Performs a dry run of metadata migration, showing which indexes, templates, and other objects will be migrated to the target cluster. ```sh console metadata evaluate ``` -Perform a dry run of metadata migration, showing which indices, templates, and other objects will be migrated to the target cluster. + +## Migrate metadata + +Migrates the metadata from the source cluster to the target cluster. ```sh console metadata migrate ``` -Perform an actual metadata migration. -*** +## Start a backfill -```sh -console backfill start -``` -If the Reindex From Snapshot service is enabled, start an instance of the service to begin moving documents to the target cluster. +If `Reindex-From-Snapshot` (RFS) is enabled, this command starts an instance of the service to begin moving documents to the target cluster: There are similar `scale UNITS` and `stop` commands to change the number of active instances for RFS. + ```sh -console backfill status --deep-check +console backfill start ``` -See the current status of the backfill migration, with the number of instances operating and the progress of the shards. -*** +## Check backfill status + +Gets the current status of the backfill migration, including the number of operating instances and the progress of the shards. + + +## Start Traffic Replayer + +If Traffic Replayer is enabled, this command starts an instance of Traffic Replayer to begin replaying traffic against the target cluster. +The `stop` command stops all active instances. ```sh console replay start ``` -If the Traffic Replayer service is enabled, start an instance of the service to begin replaying traffic against the target cluster. -The `stop` command stops all active instances. -*** +## Read logs + +Reads any logs that exist when running Traffic Replayer. Use tab completion on the path to fill in the available `NODE_IDs` and, if applicable, log file names. The tuple logs roll over at a certain size threshold, so there may be many files named with timestamps. The `jq` command pretty-prints each line of the tuple output before writing it to file. ```sh console tuples show --in /shared-logs-output/traffic-replayer-default/[NODE_ID]/tuples/console.log | jq > readable_tuples.json ``` -Use tab completion on the path to fill in the available node ids and, if applicable, log file names. The tuples logs roll over at a certain size threshold, so there may be many files named with timestamps. The `jq` command pretty-prints each line of the tuple output before writing it to file. -## Command Reference -All commands and options can be explored within the tool itself by using the `--help` option, either for the entire `console` application or for individual components (e.g. `console backfill --help`). The console also has command autocomplete set up to assist with usage. +## Help command -``` +All commands and options can be explored within the tool itself by using the `--help` option, either for the entire `console` application or for individual components (for example, `console backfill --help`). For example: + +```bash $ console --help Usage: console [OPTIONS] COMMAND [ARGS]... diff --git a/_migrations/migration-phases/assessing-your-cluster-for-migration.md b/_migrations/migration-phases/assessing-your-cluster-for-migration.md new file mode 100644 index 0000000000..d056754555 --- /dev/null +++ b/_migrations/migration-phases/assessing-your-cluster-for-migration.md @@ -0,0 +1,44 @@ +--- +layout: default +title: Assessing your cluster for migration +nav_order: 60 +has_children: true +parent: Migration phases +--- + +# Assessing your cluster for migration + +The goal of Migration Assistant is to streamline the process of migrating from one location or version of Elasticsearch/OpenSearch to another. However, completing a migration sometimes requires resolving client compatibility issues before they can communicate directly with the target cluster. + +## Understanding breaking changes + +Before performing any upgrade or migration, you should review any breaking changes documentation. Even if the cluster is migrated, there may be changes required in order for clients to connect to the new cluster. + +## Upgrade and breaking changes guides + +For migration paths between Elasticsearch 6.8 and OpenSearch 2.x, you should be familiar with the following documentation, depending on your specific use case: + +* [Upgrading Amazon OpenSearch Service domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/version-migration.html). + +* [Amazon OpenSearch Service rename - Summary of changes](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/rename.html). + +* [OpenSearch breaking changes](https://opensearch.org/docs/latest/breaking-changes/). + +The next step is to set up a proper test bed to verify that your applications will work as expected on the target version. + +## Impact of data transformations + +Any time you apply a transformation to your data, such as changing index names, modifying field names or field mappings, or splitting indexes with type mappings, these changes may need to be reflected in your client configurations. For example, if your clients are reliant on specific index or field names, you must ensure that their queries are updated accordingly. + + + +We recommend running production-like queries against the target cluster before switching to actual production traffic. This helps verify that the client can: + +- Communicate with the target cluster. +- Locate the necessary indexes and fields. +- Retrieve the expected results. + +For complex migrations involving multiple transformations or breaking changes, we highly recommend performing a trial migration with representative, non-production data (for example, in a staging environment) to fully test client compatibility with the target cluster. + + + diff --git a/_migrations/migration-phases/assessment/index.md b/_migrations/migration-phases/assessment/index.md deleted file mode 100644 index feda45c3f7..0000000000 --- a/_migrations/migration-phases/assessment/index.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -layout: default -title: Assessing your cluster -nav_order: 60 -has_children: true -parent: Migration phases ---- \ No newline at end of file diff --git a/_migrations/migration-phases/assessment/required-client-changes.md b/_migrations/migration-phases/assessment/required-client-changes.md deleted file mode 100644 index 0cce4624bf..0000000000 --- a/_migrations/migration-phases/assessment/required-client-changes.md +++ /dev/null @@ -1,41 +0,0 @@ - - - -The goal of the Migration Assistant is to streamline the process of migrating from one location or version of Elasticsearch/OpenSearch to another. However, completing a migration sometimes requires resolving client compatibility issues before they can communicate directly with the target cluster. - -It's crucial to understand and plan for any necessary changes before beginning the migration process. The previous page on [[breaking changes between versions|Understanding-breaking-changes]] is a useful resource for identifying potential issues. - -## Data Transformations and Client Impact - -Any time you apply a transformation to your data, such as: - -- Changing index names -- Modifying field names or field mappings -- Splitting indices with type mappings - -These changes may need to be reflected in your client configurations. For example, if your clients are reliant on specific index or field names, you must ensure that their queries are updated accordingly. - -We recommend running production-like queries against the target cluster before switching over actual production traffic. This helps verify that the client can: - -- Communicate with the target cluster -- Locate the necessary indices and fields -- Retrieve the expected results - -For complex migrations involving multiple transformations or breaking changes, we highly recommend performing a trial migration with representative, non-production data (e.g., in a staging environment) to fully test client compatibility with the target cluster. - -## Troubleshooting - -### Migrating from Elasticsearch (Post-Fork) to OpenSearch - -Migrating from post-fork Elasticsearch (7.10.2+) to OpenSearch presents additional challenges because some Elasticsearch clients include license or version checks that can artificially break compatibility. - -No post-fork Elasticsearch clients are fully compatible with OpenSearch 2.x. We recommend switching to the latest version of the [OpenSearch Clients](https://opensearch.org/docs/latest/clients/) ↗. - -### Inspecting the tuple output - -The Replayer outputs that show the exact requests being sent to both the source and target clusters. Examining these tuples can help you identify any transformations between requests, allowing you to ensure that these changes are reflected in your client code. See [[In-flight Validation]] for details. - -### Related Links - -For more information about OpenSearch clients, refer to the official documentation: - diff --git a/_migrations/migration-phases/assessment/understanding-breaking-changes.md b/_migrations/migration-phases/assessment/understanding-breaking-changes.md deleted file mode 100644 index 73eaee6e66..0000000000 --- a/_migrations/migration-phases/assessment/understanding-breaking-changes.md +++ /dev/null @@ -1,16 +0,0 @@ - - - -Before performing any upgrade or migration, you should review any documentation of breaking changes. Even if the cluster is migrated there might be changes required for clients to connect to the new cluster - -## Upgrade and breaking changes guides - -For migrations paths between Elasticsearch 6.8 and OpenSearch 2.x users should be familiar with documentation in the links below that apply to their specific case: - -* [Upgrading Amazon Service Domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/version-migration.html) ↗ - -* [Changes from Elasticsearch to OpenSearch fork](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/rename.html) ↗ - -* [OpenSearch Breaking Changes](https://opensearch.org/docs/latest/breaking-changes/) ↗ - -The next step is to set up a proper test bed to verify that your applications will work as expected on the target version. diff --git a/_migrations/migration-phases/backfill.md b/_migrations/migration-phases/backfill.md new file mode 100644 index 0000000000..ccdbadd042 --- /dev/null +++ b/_migrations/migration-phases/backfill.md @@ -0,0 +1,174 @@ +--- +layout: default +title: Backfill +nav_order: 90 +parent: Migration phases +--- + +# Backfill + +After the [metadata]({{site.url}}{{site.baseurl}}/migrations/migration-phases/migrating-metadata/) for your cluster has been migrated, you can use capture proxy data replication and snapshots to backfill your data into the next cluster. + +## Capture proxy data replication + +If you're interested in capturing live traffic during your migration, Migration Assistant includes an Application Load Balancer for routing traffic to the capture proxy and the target cluster. Upstream client traffic must be routed through the capture proxy in order to replay the requests later. Before using the capture proxy, remember the following: + +* The layer upstream from the Application Load Balancer is compatible with the certificate on the Application Load Balancer listener, whether it's for clients or a Network Load Balancer. The `albAcmCertArn` in the `cdk.context.json` may need to be provided to ensure that clients trust the Application Load Balancer certificate. +* If a Network Load Balancer is used directly upstream of the Application Load Balancer, it must use a TLS listener. +* Upstream resources and security groups must allow network access to the Migration Assistant Application Load Balancer. + +To set up the capture proxy, go to the AWS Management Console and navigate to **EC2 > Load Balancers > Migration Assistant Application Load Balancer**. Copy the Application Load Balancer URL. With the URL copied, you can use one of the following options. + + +### If you are using **Network Load Balancer → Application Load Balancer → Cluster** + +1. Ensure that ingress is provided directly to the Application Load Balancer for the capture proxy. +2. Create a target group for the Migration Assistant Application Load Balancer on port `9200`, and set the health check to `HTTPS`. +3. Associate this target group with your existing Network Load Balancer on a new listener for testing. +4. Verify that the health check is successful, and perform smoke testing with some clients through the new listener port. +5. Once you are ready to migrate all clients, detach the Migration Assistant Application Load Balancer target group from the testing Network Load Balancer listener and modify the existing Network Load Balancer listener to direct traffic to this target group. +6. Now client requests will be routed through the proxy (once they establish a new connection). Verify the application metrics. + +### If you are using **Network Load Balancer → Cluster** + +If you do not want to modify application logic, add an Application Load Balancer in front of your cluster and follow the **Network Load Balancer → Application Load Balancer → Cluster** steps. Otherwise: + +1. Create a target group for the Application Load Balancer on port `9200` and set the health check to `HTTPS`. +2. Associate this target group with your existing Network Load Balancer on a new listener. +3. Verify that the health check is successful, and perform smoke testing with some clients through the new listener port. +4. Once you are ready to migrate all clients, deploy a change so that clients hit the new listener. + + +### If you are **not using an Network Load Balancer** + +If you're only using backfill as your migration technique, make a client/DNS change to route clients to the Migration Assistant Application Load Balancer on port `9200`. + + +### Kafka connection + +After you have routed the client based on your use case, test adding records against HTTP requests using the following steps: + +1. In the migration console, run the following command: + + ```shell + console kafka describe-topic-records + ``` + + Note the records in the logging topic. + +2. After a short period, execute the same command again and compare the increased number of records against the expected HTTP requests. + + +## Creating a snapshot + +Create a snapshot for your backfill using the following command: + +```bash +console snapshot create +``` + +To check the progress of your snapshot, use the following command: + +```bash +console snapshot status --deep-check +``` + +Depending on the size of the data in the source cluster and the bandwidth allocated for snapshots, the process can take some time. Adjust the maximum rate at which the source cluster's nodes create the snapshot using the `--max-snapshot-rate-mb-per-node` option. Increasing the snapshot rate will consume more node resources, which may affect the cluster's ability to handle normal traffic. + +## Backfilling documents to the source cluster + +From the snapshot you created of your source cluster, you can begin backfilling documents into the target cluster. Once you have started this process, a fleet of workers will spin up to read the snapshot and reindex documents into the target cluster. This fleet of workers can be scaled to increased the speed at which documents are reindexed into the target cluster. + +### Checking the starting state of the clusters + +You can check the indexes and document counts of the source and target clusters by running the `cat-indices` command. This can be used to monitor the difference between the source and target for any migration scenario. Check the indexes of both clusters using the following command: + +```shell +console clusters cat-indices +``` + +You should receive the following response: + +```shell +SOURCE CLUSTER +health status index uuid pri rep docs.count docs.deleted store.size pri.store.size +green open my-index WJPVdHNyQ1KMKol84Cy72Q 1 0 8 0 44.7kb 44.7kb + +TARGET CLUSTER +health status index uuid pri rep docs.count docs.deleted store.size pri.store.size +green open .opendistro_security N3uy88FGT9eAO7FTbLqqqA 1 0 10 0 78.3kb 78.3kb +``` + +### Starting the backfill + +Use the following command to start the backfill and deploy the workers: + +```shell +console backfill start +``` + +You should receive a response similar to the following: + +```shell +BackfillStatus.RUNNING +Running=1 +Pending=0 +Desired=1 +Shards total: 48 +Shards completed: 48 +Shards incomplete: 0 +Shards in progress: 0 +Shards unclaimed: 0 +``` + +The status will be `Running` even if all the shards have been migrated. + +### Scaling up the fleet + +To speed up the transfer, you can scale the number of workers. It may take a few minutes for these additional workers to come online. The following command will update the worker fleet to a size of 10: + +```shell +console backfill scale 5 +``` + +We recommend slowly scaling up the fleet while monitoring the health metrics of the target cluster to avoid over-saturating it. [Amazon OpenSearch Service domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/monitoring.html) provide a number of metrics and logs that can provide this insight. + +### Stopping the migration + +Backfill requires manually stopping the fleet. Once all the data has been migrated, you can shut down the fleet and all its workers using the following command: +Backfill requires manually stopping the fleet. Once all the data has been migrated, you can shut down the fleet and all its workers using the following command: +```shell +console backfill stop +``` + +### Amazon CloudWatch metrics and dashboard + +Migration Assistant creates an Amazon CloudWatch dashboard that you can use to visualize the health and performance of the backfill process. It combines the metrics for the backfill workers and, for those migrating to Amazon OpenSearch Service, the target cluster. + +You can find the backfill dashboard in the CloudWatch console based on the AWS Region in which you have deployed Migration Assistant. The metric graphs for your target cluster will be blank until you select the OpenSearch domain you're migrating to from the dropdown menu at the top of the dashboard. + +## Validating the backfill + +After the backfill is complete and the workers have stopped, examine the contents of your cluster using the [Refresh API](https://opensearch.org/docs/latest/api-reference/index-apis/refresh/) and the [Flush API](https://opensearch.org/docs/latest/api-reference/index-apis/flush/). The following example uses the console CLI with the Refresh API to check the backfill status: + +```shell +console clusters cat-indices --refresh +``` + +This will display the number of documents in each of the indexes in the target cluster, as shown in the following example response: + +```shell +SOURCE CLUSTER +health status index uuid pri rep docs.count docs.deleted store.size pri.store.size +green open my-index -DqPQDrATw25hhe5Ss34bQ 1 0 3 0 12.7kb 12.7kb + +TARGET CLUSTER +health status index uuid pri rep docs.count docs.deleted store.size pri.store.size +green open .opensearch-observability 8HOComzdSlSWCwqWIOGRbQ 1 1 0 0 416b 208b +green open .plugins-ml-config 9tld-PCJToSUsMiyDhlyhQ 5 1 1 0 9.5kb 4.7kb +green open my-index bGfGtYoeSU6U6p8leR5NAQ 1 0 3 0 5.5kb 5.5kb +green open .migrations_working_state lopd47ReQ9OEhw4ZuJGZOg 1 1 2 0 18.6kb 6.4kb +green open .kibana_1 +``` + +You can run additional queries against the target cluster to mimic your production workflow and closely examine the results. diff --git a/_migrations/migration-phases/backfill/backfill-execution.md b/_migrations/migration-phases/backfill/backfill-execution.md deleted file mode 100644 index 828317bada..0000000000 --- a/_migrations/migration-phases/backfill/backfill-execution.md +++ /dev/null @@ -1,94 +0,0 @@ - - - -After the [[Metadata Migration]] has been completed; begin the backfill of documents from the snapshot of the source cluster. - -## Document Migration -Once started a fleet of workers will spin up to read the snapshot and reindex documents on the target cluster. This fleet of workers can be scaled to increased the speed that documents are reindexed onto target cluster. - -### Check the starting state of the clusters - -You can see the indices and rough document counts of the source and target cluster by running the cat-indices command. This can be used to monitor the difference between the source and target for any migration scenario. Check the indices of both clusters with the following command: - -```shell -console clusters cat-indices -``` - -
- -Example cat-indices command output - - -```shell -SOURCE CLUSTER -health status index uuid pri rep docs.count docs.deleted store.size pri.store.size -green open my-index WJPVdHNyQ1KMKol84Cy72Q 1 0 8 0 44.7kb 44.7kb - -TARGET CLUSTER -health status index uuid pri rep docs.count docs.deleted store.size pri.store.size -green open .opendistro_security N3uy88FGT9eAO7FTbLqqqA 1 0 10 0 78.3kb 78.3kb -``` -
- -### Start the backfill - -By starting the backfill by running the following command, which creates a fleet with a single worker: - -```shell -console backfill start -``` - -### Monitor the status - -You can use the status check command to see more detail about things like the number of shards completed, in progress, remaining, and the overall status of the operation: - -```shell -console backfill status --deep-check -``` - -
- -Example status output - - -``` -BackfillStatus.RUNNING -Running=1 -Pending=0 -Desired=1 -Shards total: 48 -Shards completed: 48 -Shards incomplete: 0 -Shards in progress: 0 -Shards unclaimed: 0 -``` -
- ->[!Note] -> The status will be "RUNNING" even if all the shards have been migrated. - -### Scale up the fleet - -To speed up the transfer, you can scale the number of workers. It may take a few minutes for these additional workers to come online. The following command will update the worker fleet to a size of ten: - -```shell -console backfill scale 5 -``` - -### Stopping the migration -Backfill requires manually stopping the fleet. Once all the data has been migrated using by checking the status. You can spin down the fleet and all its workers with the command: - -```shell -console backfill stop -``` - - -## Troubleshooting - -### How to scaling the fleet - -It is recommended to scale up the fleet slowly while monitoring the health metrics of the Target Cluster to avoid over-saturating it. Amazon OpenSearch Domains provide a number of metrics and logs that can provide this insight; refer to [the official documentation on the subject](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/monitoring.html) ↗. The AWS Console for Amazon Opensearch Service surfaces details that can be useful for this as well. - -## Related Links - -- [Technical details for RFS](https://github.com/opensearch-project/opensearch-migrations/blob/main/RFS/docs/DESIGN.md) diff --git a/_migrations/migration-phases/backfill/backfill-result-validation.md b/_migrations/migration-phases/backfill/backfill-result-validation.md deleted file mode 100644 index e6cbd97482..0000000000 --- a/_migrations/migration-phases/backfill/backfill-result-validation.md +++ /dev/null @@ -1,37 +0,0 @@ - - - -After the backfill has been completed and the fleet has been stopped - -## Refresh the target cluster - -Before examining the contents of the target cluster, it is recommended to run a `_refresh` and `_flush` on the target cluster. This will help ensure that the report and metrics of the cluster will be accurate portrayed. - -## Validate documents on target cluster -You can check the contents of the Target Cluster after the migration using the Console CLI: - -``` -console clusters cat-indices --refresh -``` -Example cat-indices command output - -```shell -SOURCE CLUSTER -health status index uuid pri rep docs.count docs.deleted store.size pri.store.size -green open my-index -DqPQDrATw25hhe5Ss34bQ 1 0 3 0 12.7kb 12.7kb - -TARGET CLUSTER -health status index uuid pri rep docs.count docs.deleted store.size pri.store.size -green open .opensearch-observability 8HOComzdSlSWCwqWIOGRbQ 1 1 0 0 416b 208b -green open .plugins-ml-config 9tld-PCJToSUsMiyDhlyhQ 5 1 1 0 9.5kb 4.7kb -green open my-index bGfGtYoeSU6U6p8leR5NAQ 1 0 3 0 5.5kb 5.5kb -green open .migrations_working_state lopd47ReQ9OEhw4ZuJGZOg 1 1 2 0 18.6kb 6.4kb -green open .kibana_1 -``` - -This will display the number of documents on each of the indices on the Target Cluster. It is further recommended to run some queries against the Target Cluster that mimic your production workflow and closely examine the results returned. - -## Related Links - -- [Refresh API](https://opensearch.org/docs/latest/api-reference/index-apis/refresh/) ↗ -- [Flush API](https://opensearch.org/docs/latest/api-reference/index-apis/flush/) ↗ diff --git a/_migrations/migration-phases/backfill/capture-proxy-data-replication.md b/_migrations/migration-phases/backfill/capture-proxy-data-replication.md deleted file mode 100644 index 2ed3b1f4c4..0000000000 --- a/_migrations/migration-phases/backfill/capture-proxy-data-replication.md +++ /dev/null @@ -1,45 +0,0 @@ - - - -The Migration Assistant includes an Application Load Balancer (ALB) for routing traffic to the capture proxy and/or target. Upstream client traffic must be routed through the capture proxy in order to replay the requests later. - -## Assumptions - -* The upstream layer from the ALB is compatible with the certificate on the ALB listener (whether it’s clients or a Network Load Balancer, NLB). - * The `albAcmCertArn` in the `cdk.context.json` may need to be provided to ensure that clients trust the ALB certificate. -* If an NLB is used directly upstream of the ALB, it must use a TLS listener. -* Upstream resources and security groups must allow network access to the Migration Assistant ALB. - -## Steps - -1. In the AWS Console, navigate to **EC2 > Load Balancers > Migration Assistant ALB**. -2. Note down the ALB URL. -3. If you are using **NLB → ALB → Cluster**: - 1. Ensure ingress is provided directly to the ALB for the capture proxy. - 2. Create a target group for the Migration Assistant ALB on port 9200, and set the health check to HTTPS. - 3. Associate this target group with your existing NLB on a new listener (for testing). - 4. Verify the health check is successful, and perform smoke testing with some clients through the new listener port. - 5. Once ready to migrate all clients, detach the Migration Assistant ALB target group from the testing NLB listener and modify the existing NLB listener to direct traffic to this target group. - 6. Now, client requests will be routed through the proxy (once they establish a new connection). Verify application metrics. -4. If you are using **NLB → Cluster**: - 1. If you do not wish to modify application logic, add an ALB in front of your cluster and follow the **NLB → ALB → Cluster** steps. Otherwise: - 2. Create a target group for the ALB on port 9200 and set the health check to HTTPS. - 3. Associate this target group with your existing NLB on a new listener. - 4. Verify the health check is successful, and perform smoke testing with some clients through the new listener port. - 5. Once ready to migrate all clients, deploy a change so that clients hit the new listener. -5. If you are **not using an NLB**: - 1. Make a client/DNS change to route clients to the Migration Assistant ALB on port 9200. -6. In the Migration Console, execute the following command: - ```shell - console kafka describe-topic-records - ``` - Note the records in the logging topic. -7. After a short period, execute the same command again and compare the increase in records against the expected HTTP requests. - -### Troubleshooting - -* Investigate the ALB listener security policy, security groups, ALB certificates, and the proxy's connection to Kafka. - -### Related Links - -- [Migration Console ALB Documentation](https://github.com/opensearch-project/opensearch-migrations/blob/main/docs/ClientTrafficSwinging.md) \ No newline at end of file diff --git a/_migrations/migration-phases/backfill/index.md b/_migrations/migration-phases/backfill/index.md deleted file mode 100644 index 8bdb1e2a8d..0000000000 --- a/_migrations/migration-phases/backfill/index.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -layout: default -title: Backfill -nav_order: 90 -has_children: true -parent: Migration phases ---- \ No newline at end of file diff --git a/_migrations/migration-phases/client-traffic-switchover/Switching-Traffic-from-Source-to-Target.md b/_migrations/migration-phases/client-traffic-switchover/Switching-Traffic-from-Source-to-Target.md deleted file mode 100644 index ea7182989a..0000000000 --- a/_migrations/migration-phases/client-traffic-switchover/Switching-Traffic-from-Source-to-Target.md +++ /dev/null @@ -1,37 +0,0 @@ - - -After the source and target clusters are in sync traffic needs to be switched to the target cluster so the source cluster can be taken offline. - -## Assumptions -- All client traffic is being routed through switchover listener in MigrationAssistant ALB -- Client traffic has been verified to be compatible with Target Cluster -- Target cluster is in a good state to accept client traffic (i.e. backfill/replay is complete as needed) -- Target Proxy Service is deployed - -## Switch Traffic to the Source Cluster -1. Within the AWS Console, navigate to ECS > Migration Assistant Cluster -1. Note down the desired count of the Capture Proxy (it should be > 1) -1. Update the ECS Service of the Target Proxy to be at least as large as the Traffic Capture Proxy -1. Wait for tasks to startup, verify all targets healthy within Target Proxy Service "Load balancer health" -1. Within the AWS Console, navigate to EC2 > Load Balancers > Migration Assistant ALB -1. Navigate to ALB Metrics and examine any information which may be useful - 1. Specifically look at Active Connection Count and New Connection Count and note if theres a big discrepancy, this can indicate a reused connections which affect how traffic will switchover. Once an ALB is re-routed, existing connections will still be routed to the capture proxy until the client/source cluster terminates those. -1. Navigate to the Capture Proxy Target Group (ALBSourceProxy--TG) > Monitoring -1. Examine Metrics Requests, Target (2XX, 3XX, 4XX, 5XX), and Target Response Time, Metrics - 1. Verify that this looks as expected and includes all traffic expected to be included in the switchover - 1. Note details that would help identify anomalies during the switchover including expected response time and response code rate. -1. Navigate back to ALB and click on Target Proxy Target Group (ALBTargetProxy--TG) -1. Verify all expected targets are healthy and none are in draining state -1. Navigate back to ALB and to the Listener on port 9200 -1. Click on the Default rule and Edit -1. Modify the weights of the targets to shift desired traffic over to the target proxy - 1. To perform a full switchover, modify the weight to 1 on Target Proxy and 0 on Source Proxy -1. Click Save Changes -1. Navigate to both SourceProxy and TargetProxy TG Monitoring metrics and verify traffic is shifting over as expected - 1. If connections are being reused by clients, perform any actions if needed to terminate those to get the clients to shift over. - 1. Monitor until SourceProxy TG shows 0 requests when it is known all clients have switchover - -## Troubleshooting - -### Fallback -If needed to fallback, revert the Default rule to have the ALB route to the SourceProxy Target Group \ No newline at end of file diff --git a/_migrations/migration-phases/client-traffic-switchover/index.md b/_migrations/migration-phases/client-traffic-switchover/index.md deleted file mode 100644 index 1a381a0ef6..0000000000 --- a/_migrations/migration-phases/client-traffic-switchover/index.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -layout: default -title: Client traffic switchover -nav_order: 110 -has_children: true -parent: Migration phases ---- \ No newline at end of file diff --git a/_migrations/migration-phases/index.md b/_migrations/migration-phases/index.md index 95cd93b495..b637d4a28d 100644 --- a/_migrations/migration-phases/index.md +++ b/_migrations/migration-phases/index.md @@ -3,4 +3,11 @@ layout: default title: Migration phases nav_order: 50 has_children: true ---- \ No newline at end of file +--- + +This page details how to conduct a migration with Migration Assistant. It encompasses a variety of scenarios including: + +- [**Metadata migration**]({{site.url}}{{site.baseurl}}/migrations/migration-phases/migrating-metadata/): Migrating cluster metadata, such as index settings, aliases, and templates. +- [**Backfill migration**]({{site.url}}{{site.baseurl}}/migrations/migration-phases/backfill/): Migrating existing or historical data from a source to a target cluster. +- **Live traffic migration**: Replicating live ongoing traffic from a source to a target cluster. + diff --git a/_migrations/migration-phases/metadata/Snapshot-Creation.md b/_migrations/migration-phases/metadata/Snapshot-Creation.md deleted file mode 100644 index 683b097d5e..0000000000 --- a/_migrations/migration-phases/metadata/Snapshot-Creation.md +++ /dev/null @@ -1,56 +0,0 @@ ---- -layout: default -title: Snapshot creation -nav_order: 90 -parent: Metadata -grand_parent: Migration phases ---- - - -# Snapshot creation - -Creating a snapshot of the source cluster capture all the metadata and documents to migrate onto a new target cluster. - -## Limitations - -Incremental or "delta" snapshots are not yet supported. For more information, refer to the [tracking issue MIGRATIONS-1624](https://opensearch.atlassian.net/browse/MIGRATIONS-1624). A single snapshot must be used for a backfill. - -## Snapshot Creation from the Console - -Create the initial snapshot on the source cluster with the following command: - -```shell -console snapshot create -``` - -To check the progress of the snapshot in real-time, use the following command: - -```shell -console snapshot status --deep-check -``` - -
-Example Output When a Snapshot is Completed - -```shell -console snapshot status --deep-check - -SUCCESS -Snapshot is SUCCESS. -Percent completed: 100.00% -Data GiB done: 29.211/29.211 -Total shards: 40 -Successful shards: 40 -Failed shards: 0 -Start time: 2024-07-22 18:21:42 -Duration: 0h 13m 4s -Anticipated duration remaining: 0h 0m 0s -Throughput: 38.13 MiB/sec -``` -
- -## Troubleshooting - -### Slow Snapshot Speed - -Depending on the size of the data on the source cluster and the bandwidth allocated for snapshots, the process can take some time. Adjust the maximum rate at which the source cluster's nodes create the snapshot using the `--max-snapshot-rate-mb-per-node` option. Increasing the snapshot rate will consume more node resources, which may affect the cluster's ability to handle normal traffic. If not specified, the default rate for the source cluster's version will be used. For more details, refer to the [Elasticsearch 7.10 snapshot documentation](https://www.elastic.co/guide/en/elasticsearch/reference/7.10/put-snapshot-repo-api.html#put-snapshot-repo-api-request-body) ↗. \ No newline at end of file diff --git a/_migrations/migration-phases/metadata/index.md b/_migrations/migration-phases/metadata/index.md deleted file mode 100644 index ac5618ffd4..0000000000 --- a/_migrations/migration-phases/metadata/index.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -layout: default -title: Metadata -nav_order: 80 -has_children: true -parent: Migration phases ---- \ No newline at end of file diff --git a/_migrations/migration-phases/metadata/Metadata-Migration.md b/_migrations/migration-phases/migrating-metadata.md similarity index 61% rename from _migrations/migration-phases/metadata/Metadata-Migration.md rename to _migrations/migration-phases/migrating-metadata.md index 543a249639..2a4079ca3f 100644 --- a/_migrations/migration-phases/metadata/Metadata-Migration.md +++ b/_migrations/migration-phases/migrating-metadata.md @@ -1,17 +1,53 @@ --- layout: default -title: Metadata migration +title: Migrating metadata nav_order: 85 -parent: Metadata -grand_parent: Migration phases +parent: Migration phases --- -# Metadata migration +# Migrating metadata -Metadata migration is a relatively fast process to execute so we recommend attempting this workflow as early as possible to discover any issues which could impact longer running migration steps. +Metadata migration involves creating a snapshot of your cluster and then migrating the metadata from the snapshot using the migration console. -## Prerequisites -A snapshot of the cluster state will need to be taken, [[guide to create a snapshot|Snapshot Creation]]. +This tool gathers information from a source cluster through a snapshot or through HTTP requests against the source cluster. These snapshots are fully compatible with the backfill process for `Reindex-From-Snapshot` (RFS) scenarios. + +After collecting information on the source cluster, comparisons are made against the target cluster. If running a migration, any metadata items that do not already exist will be created on the target cluster. + +## Creating the snapshot + +Creating a snapshot of the source cluster captures all the metadata and documents to be migrated to a new target cluster. + +Create the initial snapshot of the source cluster using the following command: + +```shell +console snapshot create +``` + +To check the progress of the snapshot in real time, use the following command: + +```shell +console snapshot status --deep-check +``` + +You should receive the following response when the snapshot is created: + +```shell +SUCCESS +Snapshot is SUCCESS. +Percent completed: 100.00% +Data GiB done: 29.211/29.211 +Total shards: 40 +Successful shards: 40 +Failed shards: 0 +Start time: 2024-07-22 18:21:42 +Duration: 0h 13m 4s +Anticipated duration remaining: 0h 0m 0s +Throughput: 38.13 MiB/sec +``` + +### Managing slow snapshot speeds + +Depending on the size of the data in the source cluster and the bandwidth allocated for snapshots, the process can take some time. Adjust the maximum rate at which the source cluster's nodes create the snapshot using the `--max-snapshot-rate-mb-per-node` option. Increasing the snapshot rate will consume more node resources, which may affect the cluster's ability to handle normal traffic. ## Command Arguments @@ -47,9 +83,9 @@ INFO:console_link.models.metadata:Migrating metadata with command: /root/metadat . . ``` - -## Metadata verification with evaluate command + +## Using the `evaluate` command By scanning the contents of the source cluster, applying filtering, and applying modifications a list of all items that will be migrated will be created. Any items not seen in this output will not be migrated onto the target cluster if the migrate command was to be run. This is a safety check before making modifications on the target cluster. @@ -57,12 +93,9 @@ By scanning the contents of the source cluster, applying filtering, and applying console metadata evaluate [...] ``` -
- -Example evaluate command output - +You should receive a response similar to the following: -``` +```bash Starting Metadata Evaluation Clusters: Source: @@ -89,9 +122,9 @@ Migration Candidates: Results: 0 issue(s) detected ``` -
-## Metadata migration with migrate command + +## Using the migrate command Running through the same data as the evaluate command all of the migrated items will be applied onto the target cluster. If re-run multiple times items that were previously migrated will not be recreated. If any items do need to be re-migrated, please delete them from the target cluster and then rerun the evaluate then migrate commands to ensure the desired changes are made. @@ -99,12 +132,9 @@ Running through the same data as the evaluate command all of the migrated items console metadata migrate [...] ``` -
- -Example migrate command output - +You should receive a response similar to the following: -``` +```shell Starting Metadata Migration Clusters: @@ -138,9 +168,12 @@ Results: Before moving on to additional migration steps, it is recommended to confirm details of your cluster. Depending on your configuration, this could be checking the sharding strategy or making sure index mappings are correctly defined by ingesting a test document. -## Troubleshoot issues +## Troubleshooting + +Use these instructions to help troubleshoot the following issues. + +### Access detailed logs -### Access Detailed Logs Metadata migration creates a detailed log file that includes low level tracing information for troubleshooting. For each execution of the program a log file is created inside a shared volume on the migration console named `shared-logs-output` the following command will list all log files, one for each run of the command. ```shell @@ -153,23 +186,21 @@ To inspect the file within the console `cat`, `tail` and `grep` commands line to tail /shared-logs-output/migration-console-default/*/metadata/*.log ``` -### Warnings / Errors inline -There might be `WARN` or `ERROR` elements inline the output, they will be accompanied by a short message, such as `WARN - my_index already exists`. Full information will be in the detailed logs associated with this warnings or errors. +### Warnings and errors -### OpenSearch running in compatibility mode -There might be an error about being unable to update an ES 7.10.2 cluster, this can occur when compatibility mode has been enabled on an OpenSearch cluster please disable it to continue, see [Enable compatibility mode](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/rename.html#rename-upgrade) ↗. +When encountering `WARN` or `ERROR` elements in the response, they will be accompanied by a short message, such as `WARN - my_index already exists`. More information can be found in the detailed logs associated with the warning or error. -## How the tool works +### OpenSearch running in compatibility mode -This tool gathers information from a source cluster, through a snapshot or through HTTP requests against the source cluster. These snapshots are fully compatible with the backfill process for Reindex-From-Snapshot (RFS) scenarios, [[learn more|Backfill-Execution]]. +There might be an error about being unable to update an ES 7.10.2 cluster, this can occur when compatibility mode has been enabled on an OpenSearch cluster disable it to continue, see [Enable compatibility mode](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/rename.html#rename-upgrade). -After collecting information on the source cluster comparisons are made on the target cluster. If running a migration, any metadata items do not already exist will be created on the target cluster. ### Breaking change compatibility -Metadata migration needs to modify data from the source to the target versions to recreate items. Sometimes these features are no longer supported and have been removed from the target version. Sometimes these features are not available on the target version, which is especially true when downgrading. While this tool is meant to make this process easier, it is not exhaustive in its support. When encountering a compatibility issue or an important feature gap for your migration, please [search the issues](https://github.com/opensearch-project/opensearch-migrations/issues) and comment + upvote or a [create a new](https://github.com/opensearch-project/opensearch-migrations/issues/new/choose) issue if one cannot be found. +Metadata migration requires modifying data from the source to the target versions to recreate items. Sometimes these features are no longer supported and have been removed from the target version. Sometimes these features are not available in the target version, which is especially true when downgrading. While this tool is meant to make this process easier, it is not exhaustive in its support. When encountering a compatibility issue or an important feature gap for your migration, [search the issues and comment on the existing issue](https://github.com/opensearch-project/opensearch-migrations/issues) or [create a new](https://github.com/opensearch-project/opensearch-migrations/issues/new/choose) issue if one cannot be found. #### Deprecation of Mapping Types + In Elasticsearch 6.8 the mapping types feature was discontinued in Elasticsearch 7.0+ which has created complexity in migrating to newer versions of Elasticsearch and OpenSearch, [learn more](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/removal-of-types.html) ↗. As Metadata migration supports migrating from ES 6.8 on to the latest versions of OpenSearch this scenario is handled by removing the type mapping types and restructuring the template or index properties. Note that, at the time of this writing multiple type mappings are not supported, [tracking task](https://opensearch.atlassian.net/browse/MIGRATIONS-1778) ↗. @@ -203,4 +234,4 @@ As Metadata migration supports migrating from ES 6.8 on to the latest versions o } ``` -*Technical details are available, [view source code](https://github.com/opensearch-project/opensearch-migrations/blob/main/transformation/src/main/java/org/opensearch/migrations/transformation/rules/IndexMappingTypeRemoval.java).* \ No newline at end of file +For additional technical details, [view the mapping type removal source code](https://github.com/opensearch-project/opensearch-migrations/blob/main/transformation/src/main/java/org/opensearch/migrations/transformation/rules/IndexMappingTypeRemoval.java). diff --git a/_migrations/migration-phases/post-migration-cleanup/index.md b/_migrations/migration-phases/post-migration-cleanup/index.md deleted file mode 100644 index 3f76d3ed34..0000000000 --- a/_migrations/migration-phases/post-migration-cleanup/index.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -layout: default -title: Migration infrastructure teardown -nav_order: 120 -has_children: true -parent: Migration phases ---- \ No newline at end of file diff --git a/_migrations/migration-phases/post-migration-cleanup/Migration-Infrastructure-Teardown.md b/_migrations/migration-phases/removing-migration-infrastructure.md similarity index 60% rename from _migrations/migration-phases/post-migration-cleanup/Migration-Infrastructure-Teardown.md rename to _migrations/migration-phases/removing-migration-infrastructure.md index 779dda81c5..75413f25f0 100644 --- a/_migrations/migration-phases/post-migration-cleanup/Migration-Infrastructure-Teardown.md +++ b/_migrations/migration-phases/removing-migration-infrastructure.md @@ -1,16 +1,20 @@ +--- +layout: default +title: Removing migration infrastructure +nav_order: 120 +parent: Migration phases +--- - +# Removing migration infrastructure After a migration is complete all resources should be removed except for the target cluster, and optionally your Cloudwatch Logs, and Replayer logs. -## Remove Migration Assistant Infrastructure To remove all the CDK stack(s) which get created during a deployment you can execute a command similar to below within the CDK directory -``` +```bash cdk destroy "*" --c contextId= ``` Follow the instructions on the command-line to remove the deployed resources from the AWS account. -> [!Note] -> The AWS Console can also be used to verify, remove, and confirm resources for the Migration Assistant are no longer in the account. \ No newline at end of file +The AWS Management Console can also be used to remove Migration Assistant resources and confirm that they are no longer in the account. \ No newline at end of file diff --git a/_migrations/migration-phases/replay-activation-and-validation/In-flight-Validation.md b/_migrations/migration-phases/replay-activation-and-validation/In-flight-Validation.md deleted file mode 100644 index b2003700f7..0000000000 --- a/_migrations/migration-phases/replay-activation-and-validation/In-flight-Validation.md +++ /dev/null @@ -1,152 +0,0 @@ - - - -The Replayer is a long-running process that makes requests to a target cluster to maintain synchronization with the source cluster and enable users to compare the performance between the two clusters. There are two primary ways to assess how the target requests are being handled: through logs and metrics. - -## Result Logs - -HTTP transactions from the source capture and those resent to the target cluster are logged in files located at `/shared-logs-output/traffic-replayer-default/*/tuples/tuples.log`. The `/shared-logs-output` directory is shared across containers, including the migration console. Users can access these files from the migration console using the same path. Previous runs are also available in gzipped form. - -Each log entry is a newline-delimited JSON object, which contains details of the source and target requests/responses along with other transaction details, such as response times. - -> **Note:** These logs contain the contents of all requests, including Authorization headers and the contents of all HTTP messages. Ensure that access to the migration environment is restricted as these logs serve as a source of truth for determining what happened on both the source and target clusters. Response times for the source refer to the time between the proxy sending the end of a request and receiving the response. While response times for the target are recorded in the same manner, keep in mind that the locations of the capture proxy, replayer, and target may differ, and these logs do not account for the client's location. - -
- -Example Log Entry - - -Below is an example log entry for a `/_cat/indices?v` request sent to both the source and target clusters: - -```json -{ - "sourceRequest": { - "Request-URI": "/_cat/indices?v", - "Method": "GET", - "HTTP-Version": "HTTP/1.1", - "Host": "capture-proxy:9200", - "Authorization": "Basic YWRtaW46YWRtaW4=", - "User-Agent": "curl/8.5.0", - "Accept": "*/*", - "body": "" - }, - "sourceResponse": { - "HTTP-Version": {"keepAliveDefault": true}, - "Status-Code": 200, - "Reason-Phrase": "OK", - "response_time_ms": 59, - "content-type": "text/plain; charset=UTF-8", - "content-length": "214", - "body": "aGVhbHRoIHN0YXR1cyBpbmRleCAgICAgICB..." - }, - "targetRequest": { - "Request-URI": "/_cat/indices?v", - "Method": "GET", - "HTTP-Version": "HTTP/1.1", - "Host": "opensearchtarget", - "Authorization": "Basic YWRtaW46bXlTdHJvbmdQYXNzd29yZDEyMyE=", - "User-Agent": "curl/8.5.0", - "Accept": "*/*", - "body": "" - }, - "targetResponses": [{ - "HTTP-Version": {"keepAliveDefault": true}, - "Status-Code": 200, - "Reason-Phrase": "OK", - "response_time_ms": 721, - "content-type": "text/plain; charset=UTF-8", - "content-length": "484", - "body": "aGVhbHRoIHN0YXR1cyBpbmRleCAgICAgICB..." - }], - "connectionId": "0242acfffe13000a-0000000a-00000005-1eb087a9beb83f3e-a32794b4.0", - "numRequests": 1, - "numErrors": 0 -} -``` -
- -### Decoding Log Content - -The contents of HTTP message bodies are Base64 encoded to handle various types of traffic, including compressed data. To view the logs in a more human-readable format, use the console library `tuples show`. Running the script as follows will produce a `readable-tuples.log` in the home directory: - -```shell -console tuples show --in /shared-logs-output/traffic-replayer-default/d3a4b31e1af4/tuples/tuples.log > readable-tuples.log -``` - -
- -Example log entry would look after running the script - - -```json -{ - "sourceRequest": { - "Request-URI": "/_cat/indices?v", - "Method": "GET", - "HTTP-Version": "HTTP/1.1", - "Host": "capture-proxy:9200", - "Authorization": "Basic YWRtaW46YWRtaW4=", - "User-Agent": "curl/8.5.0", - "Accept": "*/*", - "body": "" - }, - "sourceResponse": { - "HTTP-Version": {"keepAliveDefault": true}, - "Status-Code": 200, - "Reason-Phrase": "OK", - "response_time_ms": 59, - "content-type": "text/plain; charset=UTF-8", - "content-length": "214", - "body": "health status index uuid ..." - }, - "targetRequest": { - "Request-URI": "/_cat/indices?v", - "Method": "GET", - "HTTP-Version": "HTTP/1.1", - "Host": "opensearchtarget", - "Authorization": "Basic YWRtaW46bXlTdHJvbmdQYXNzd29yZDEyMyE=", - "User-Agent": "curl/8.5.0", - "Accept": "*/*", - "body": "" - }, - "targetResponses": [{ - "HTTP-Version": {"keepAliveDefault": true}, - "Status-Code": 200, - "Reason-Phrase": "OK", - "response_time_ms": 721, - "content-type": "text/plain; charset=UTF-8", - "content-length": "484", - "body": "health status index uuid ..." - }], - "connectionId": "0242acfffe13000a-0000000a-00000005-1eb087a9beb83f3e-a32794b4.0", - "numRequests": 1, - "numErrors": 0 -} -``` -
- -## Metrics - -The Replayer emits various OpenTelemetry metrics to CloudWatch, and traces are sent through AWS X-Ray. Here are some useful metrics that help evaluate cluster performance: - -### `sourceStatusCode` - -This metric tracks the HTTP status codes for both the source and target clusters, with dimensions for the HTTP verb (e.g., GET, POST) and the status code families (e.g., 200-299). These dimensions help quickly identify discrepancies between the source and target, such as when DELETE 200s become 4xx or GET 4xx errors turn into 5xx errors. - -### `lagBetweenSourceAndTargetRequests` - -This metric shows the delay between requests hitting the source and target clusters. With a speedup factor greater than 1 and a target cluster that can handle requests efficiently, this value should decrease as the replay progresses, indicating a reduction in replay lag. - -### Additional Metrics - -- **Throughput**: `bytesWrittenToTarget` and `bytesReadFromTarget` indicate the throughput to and from the cluster. -- **Retries**: `numRetriedRequests` tracks the number of requests retried due to status-code mismatches between the source and target. -- **Event Counts**: Various `(*)Count` metrics track the number of specific events that have completed. -- **Durations**: `(*)Duration` metrics measure the duration of each step in the process. -- **Exceptions**: `(*)ExceptionCount` shows the number of exceptions encountered during each processing phase. - -## Troubleshooting - -### CloudWatch Considerations - -Metrics pushed to CloudWatch may experience around a 5-minute visibility lag. CloudWatch also retains higher-resolution data for a shorter period than lower-resolution data. For more details, see [CloudWatch Metrics Retention Policies](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html) ↗. diff --git a/_migrations/migration-phases/replay-activation-and-validation/Synchronized-Cluster-Validation.md b/_migrations/migration-phases/replay-activation-and-validation/Synchronized-Cluster-Validation.md deleted file mode 100644 index a5c1759ae7..0000000000 --- a/_migrations/migration-phases/replay-activation-and-validation/Synchronized-Cluster-Validation.md +++ /dev/null @@ -1,161 +0,0 @@ - - - -This guide covers how to use the Replayer to replay captured traffic from a source cluster to a target cluster during the migration process. The Replayer allows users to verify that the target cluster can handle requests in the same way as the source cluster and catch up to real-time traffic for a smooth migration. - -## Replayer Configurations - -[Replayer settings](Configuration-Options) are configured during the deployment of the Migration Assistant. Make sure to set the authentication mode for the Replayer so it can properly communicate with the target cluster. Refer to the **Limitations** section below for details on how different traffic types are handled. - -### Speedup Factor - -The `--speedup-factor` option, passed via `trafficReplayerExtraArgs`, adjusts the wait times between requests. For example: -- A speedup factor of `2` sends requests at twice the original speed (e.g., a request originally sent every minute will now be sent every 30 seconds). -- A speedup factor of `0.5` will space requests further apart (e.g., requests every 2 minutes instead of every minute). - -This setting can be used to stress test the target cluster or to catch up to real-time traffic, ensuring the target cluster is ready for production client switchover. - -## When to Run the Replayer - -After deploying the Migration Assistant, the Replayer is not running by default. It should be started only after all metadata and documents have been migrated to ensure that recent changes to the source cluster are properly reflected in the target cluster. - -For example, if a document was deleted after a snapshot was taken, starting the Replayer before the document migration is complete may cause the deletion request to execute before the document is even added to the target. Running the Replayer after all other migration processes ensures that the target cluster will be consistent with the source cluster. - -## Using the Replayer - -To manage the Replayer, use the `console replay` command: - -- **Start the Replayer**: - ```bash - console replay start - ``` - This starts the Replayer with the options specified at deployment. - -- **Check Replayer Status**: - ```bash - console replay status - ``` - This command shows whether the Replayer is running, pending, or desired. "Running" shows how many container instances are actively running, "Pending" indicates how many are being provisioned, and "Desired" shows the total number of instances that should be running. - -- **Stop the Replayer**: - ```bash - console replay stop - ``` - -
- -Example Interactions - - -Check the status of the Replayer: -```bash -root@ip-10-0-2-66:~# console replay status -(, 'Running=0\nPending=0\nDesired=0') -``` - -Start the Replayer: -```bash -root@ip-10-0-2-66:~# console replay start -Replayer started successfully. -Service migration-dev-traffic-replayer-default set to 1 desired count. Currently 0 running and 0 pending. -``` - -Stop the Replayer: -```bash -root@ip-10-0-2-66:~# console replay stop -Replayer stopped successfully. -Service migration-dev-traffic-replayer-default set to 0 desired count. Currently 0 running and 0 pending. -``` -
- -### Delivery Guarantees - -The Replayer pulls traffic from Kafka and advances its commit cursor after requests have been sent to the target cluster. This provides an "at least once" delivery guarantee—requests will be replayed, but success is not guaranteed. You will need to monitor metrics, tuple outputs, or external validation to ensure the target cluster is performing as expected. - -## Time Scaling - -The Replayer sends requests in the same order they were received on each connection to the source. However, relative timing between different connections is not guaranteed. For example: - -- **Scenario**: Two connections exist—one sends a PUT request every minute, and the other sends a GET request every second. -- **Behavior**: The Replayer will maintain the sequence within each connection, but the relative timing between the connections (PUTs and GETs) is not preserved. - -### Speedup Factor Example - -Assume a source cluster responds to requests (GETs and PUTs) within 100ms: -- With a **speedup factor of 1**, the target will experience the same request rates and idle periods as the source. -- With a **speedup factor of 2**, requests will be sent twice as fast, with GETs sent every 500ms and PUTs every 30 seconds. -- At a **speedup factor of 10**, requests will be sent 10x faster, and as long as the target responds quickly, the Replayer can keep pace. - -If the target cannot respond fast enough, the Replayer will wait for the previous request to complete before sending the next one. This may cause delays and affect global relative ordering. - -## Transformations - -During migrations, some requests may need to be transformed between versions. For example, Elasticsearch supported multiple type mappings in indices, but this is no longer the case in OpenSearch. Clients may need to adjust accordingly by splitting documents into multiple indices or transforming request data. - -The Replayer automatically rewrites host and authentication headers, but for more complex transformations, custom transformation rules can be passed via the `--transformer-config` option (as described in the [Traffic Replayer README](https://github.com/opensearch-project/opensearch-migrations/blob/c3d25958a44ec2e7505892b4ea30e5fbfad4c71b/TrafficCapture/trafficReplayer/README.md#transformations)). - -### Example Transformation - -Suppose a source request includes a "tagToExcise" element that needs to be removed and its children promoted, and the URI path includes "extraThingToRemove" which should also be removed. The following Jolt script handles this transformation: - -```json -[{ "JsonJoltTransformerProvider": -[ - { - "script": { - "operation": "shift", - "spec": { - "payload": { - "inlinedJsonBody": { - "top": { - "tagToExcise": { - "*": "payload.inlinedJsonBody.top.&" - }, - "*": "payload.inlinedJsonBody.top.&" - }, - "*": "payload.inlinedJsonBody.&" - }, - "*": "payload.&" - }, - "*": "&" - } - } - }, - { - "script": { - "operation": "modify-overwrite-beta", - "spec": { - "URI": "=split('/extraThingToRemove',@(1,&))" - } - } - }, - { - "script": { - "operation": "modify-overwrite-beta", - "spec": { - "URI": "=join('',@(1,&))" - } - } - } -] -}] -``` - -The resulting request to the target will look like this: - -```http -PUT /oldStyleIndex/moreStuff HTTP/1.0 -host: testhostname - -{"top":{"properties":{"field1":{"type":"text"},"field2":{"type":"keyword"}}}} -``` - -You can pass Base64-encoded transformation scripts via `--transformer-config-base64` for convenience. - -## Troubleshooting - -### Client changes -See [[Required Client Changes]] for more information on how clients will need to be updated. - -### Request Delivery -The Replayer provides an "at least once" delivery guarantee but does not ensure request success when a replayed request arrives at the target cluster. diff --git a/_migrations/migration-phases/replay-activation-and-validation/index.md b/_migrations/migration-phases/replay-activation-and-validation/index.md deleted file mode 100644 index ab8aa175cb..0000000000 --- a/_migrations/migration-phases/replay-activation-and-validation/index.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -layout: default -title: Replay activation and validation -nav_order: 100 -has_children: true -parent: Migration phases ---- \ No newline at end of file diff --git a/_migrations/migration-phases/setup-verification/Client-Traffic-Switchover-Verification.md b/_migrations/migration-phases/setup-verification/Client-Traffic-Switchover-Verification.md deleted file mode 100644 index 617e16db41..0000000000 --- a/_migrations/migration-phases/setup-verification/Client-Traffic-Switchover-Verification.md +++ /dev/null @@ -1,23 +0,0 @@ - - - -The Migrations ALB is deployed with a listener that shifts traffic between the source and target clusters through proxy services. The ALB should start in **Source Passthrough** mode. - -## Verify traffic switchover has completed -1. In the AWS Console, navigate to **EC2 > Load Balancers**. -2. Select the **MigrationAssistant ALB**. -3. Examine the listener on port 9200 and verify that 100% of traffic is directed to the **Source Proxy**. -4. Navigate to the **Migration ECS Cluster** in the AWS Console. -5. Select the **Target Proxy Service**. -6. Verify that the desired count for the service is running: - * If the desired count is not met, update the service to increase it to at least 1 and wait for the service to start. -7. In the **Health and Metrics** tab under **Load balancer health**, verify that all targets are reporting as healthy: - * This confirms the ALB can connect to the target cluster through the target proxy. -8. (Reset) Update the desired count for the **Target Proxy Service** back to its original value in ECS. - -## Troubleshooting - -### Unexpected traffic patterns -* Verify that the target cluster allows traffic ingress from the **Target Proxy Security Group**. -* Navigate to the **Target Proxy ECS Tasks** to investigate any failing tasks: - * Set the "Filter desired status" to "Any desired status" to view all tasks, then navigate to the logs for any stopped tasks. \ No newline at end of file diff --git a/_migrations/migration-phases/setup-verification/Snapshot-Creation-Verification.md b/_migrations/migration-phases/setup-verification/Snapshot-Creation-Verification.md deleted file mode 100644 index 41f55b7eaf..0000000000 --- a/_migrations/migration-phases/setup-verification/Snapshot-Creation-Verification.md +++ /dev/null @@ -1,100 +0,0 @@ - - - -Verify that a snapshot can be created and used for metadata and backfill scenarios. - -### Install the Elasticsearch S3 Repository Plugin - -The snapshot needs to be stored in a location that the Migration Assistant can access. We use AWS S3 as that location, and the Migration Assistant creates an S3 bucket for this purpose. Therefore, it is necessary to install the Elasticsearch S3 Repository Plugin on your source nodes [as described here](https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-s3.html) ↗. - -Additionally, ensure that the plugin has been configured with AWS credentials that allow it to read and write to AWS S3. If your Elasticsearch cluster is running on EC2 or ECS instances with an execution IAM Role, include the necessary S3 permissions. Alternatively, you can store the credentials in the Elasticsearch Key Store [as described here](https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-s3-client.html) ↗. - -### Verifying S3 Repository Plugin Configuration - -You can verify that the S3 Repository Plugin is configured correctly by creating a test snapshot. - -Create an S3 bucket for the snapshot using the following AWS CLI command: - -```shell -aws s3api create-bucket --bucket --region -``` - -Register a new S3 Snapshot Repository on your source cluster using this curl command: - -```shell -curl -X PUT "http://:9200/_snapshot/test_s3_repository" -H "Content-Type: application/json" -d '{ - "type": "s3", - "settings": { - "bucket": "", - "region": "" - } -}' -``` - -You should receive a response like: `{"acknowledged":true}`. - -Create a test snapshot that captures only the cluster's metadata: - -```shell -curl -X PUT "http://:9200/_snapshot/test_s3_repository/test_snapshot_1" -H "Content-Type: application/json" -d '{ - "indices": "", - "ignore_unavailable": true, - "include_global_state": true -}' -``` - -
-Example Response - -You should receive a response like: `{"accepted":true}`. - -Check the AWS Console to confirm that your bucket contains the snapshot. It will appear similar to this: - -![Screenshot 2024-08-06 at 3 25 25 PM](https://github.com/user-attachments/assets/200818a5-e259-4837-aa2a-44c0bd7b099c) -
- -### Cleaning Up After Verification - -To remove the resources created during verification: - -Delete the snapshot: - -```shell -curl -X DELETE "http://:9200/_snapshot/test_s3_repository/test_snapshot_1?pretty" -``` - -Delete the snapshot repository: - -```shell -curl -X DELETE "http://:9200/_snapshot/test_s3_repository?pretty" -``` - -Delete the S3 bucket and its contents: - -```shell -aws s3 rm s3:// --recursive -aws s3api delete-bucket --bucket --region -``` - -### Troubleshooting - -#### Access Denied Error (403) - -If you encounter an error like `AccessDenied (Service: Amazon S3; Status Code: 403)`, verify the following: - -- The IAM role assigned to your Elasticsearch cluster has the necessary S3 permissions. -- The bucket name and region provided in the snapshot configuration match the actual S3 bucket you created. - -#### Older versions of Elasticsearch - -Older versions of the Elasticsearch S3 Repository Plugin may have trouble reading IAM Role credentials embedded in EC2 and ECS Instances. This is due to the copy of the AWS SDK shipped in them being being too old to read the new standard way of retrieving those credentials - [the Instance Metadata Service v2 (IMDSv2) specification](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html). This can result in Snapshot creation failures, with an error message like: - -``` -{"error":{"root_cause":[{"type":"repository_verification_exception","reason":"[migration_assistant_repo] path [rfs-snapshot-repo] is not accessible on master node"}],"type":"repository_verification_exception","reason":"[migration_assistant_repo] path [rfs-snapshot-repo] is not accessible on master node","caused_by":{"type":"i_o_exception","reason":"Unable to upload object [rfs-snapshot-repo/tests-s8TvZ3CcRoO8bvyXcyV2Yg/master.dat] using a single upload","caused_by":{"type":"amazon_service_exception","reason":"Unauthorized (Service: null; Status Code: 401; Error Code: null; Request ID: null)"}}},"status":500} -``` - -If you encounter this issue, you can resolve it by temporarily enabling IMDSv1 on the Instances in your source cluster for the duration of the snapshot. There is a toggle for it available in [the AWS Console](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html), as well as in [the AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/ec2/modify-instance-metadata-options.html#options). Flipping this toggle will turn on the older access model and enable the Elasticsearch S3 Repository Plugin to work as normal. - -### Related Links - -- [Elasticsearch S3 Repository Plugin Configuration Guide](https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-s3-client.html) ↗. diff --git a/_migrations/migration-phases/setup-verification/System-Reset-Before-Migration.md b/_migrations/migration-phases/setup-verification/System-Reset-Before-Migration.md deleted file mode 100644 index 708dbe5589..0000000000 --- a/_migrations/migration-phases/setup-verification/System-Reset-Before-Migration.md +++ /dev/null @@ -1,22 +0,0 @@ - - - -The following steps outline how to reset resources with Migration Assistant before executing the actual migration. At this point all verifications are expected to have been completed. These steps can be performed after [[Accessing the Migration Console]] - -## Replayer Stoppage -To stop a running Replayer service, the following command can be executed: -``` -console replay stop -``` - -## Kafka Reset -The clear all captured traffic from the Kafka topic, the following command can be executed. **Note**: This command will result in the loss of any captured traffic data up to this point by the capture proxy and thus should be used with caution. -``` -console kafka delete-topic -``` - -## Target Cluster Reset -To clear non-system indices from the target cluster that may have been created from testing, the following command can be executed. **Note**: This command will result in the loss of all data on the target cluster and should be used with caution. -``` -console clusters clear-indices --cluster target -``` diff --git a/_migrations/migration-phases/setup-verification/Traffic-Capture-Verification.md b/_migrations/migration-phases/setup-verification/Traffic-Capture-Verification.md deleted file mode 100644 index 8a99a941d3..0000000000 --- a/_migrations/migration-phases/setup-verification/Traffic-Capture-Verification.md +++ /dev/null @@ -1,36 +0,0 @@ - - - -This guide will describe how once the traffic capture proxy is deployed the captured traffic can be verified. - -## Replication setup and validation -1. Navigate to Migration ECS Cluster in AWS Console -1. Navigate to Capture Proxy Service -1. Verify > 0 desired count and running - * if not, update service to increase to at least 1 and wait for startup -1. Within "Load balancer health" on "Health and Metrics" tab, verify all targets are reporting healthy - * This means the ALB is able to connect to the source cluster through the capture proxy -1. Navigate to the Migration Console Terminal -1. Execute `console kafka describe-topic-records` -1. Wait 30 seconds for another elb health check to be recorded -1. Execute `console kafka describe-topic-records` again, Verify RECORDS increased between runs -1. Execute `console replay start` to start the replayer -1. Run `tail -f /shared-logs-output/traffic-replayer-default/*/tuples/tuples.log | jq '.targetResponses[]."Status-Code"'` to confirm that the Kafka requests were sent to the the target and that it responded as expected... If responses don't appear - * check that the migration-console can access the target cluster by running `./catIndices.sh`, which should show indices on the source and target. - * confirm that messages are still being recorded to Kafka. - * check for errors in the replayer logs ("/migration/STAGE/default/traffic-replayer-default") via CloudWatch -1. (Reset) Update Traffic Capture Proxy service desired count back to original value in ECS - -## Troubleshooting - -### Health checks response with 401/403 status code -If the source cluster is configured to require authentication the capture proxy will not be able to verify beyond receiving 401/403 status code for ALB healthchecks - -### Traffic does not reach the source cluster -Verify the Source Cluster allows traffic ingress from Capture Proxy Security Group. - -Look for failing tasks by navigating to Traffic Capture Proxy ECS Tasks. Change "Filter desired status" to "Any desired status" in order to see all tasks and navigate to logs for stopped tasks. - -### Related Links - -- [Traffic Capture Proxy Failure Modes](https://github.com/opensearch-project/opensearch-migrations/blob/main/TrafficCapture/trafficCaptureProxyServer/README.md#failure-modes) diff --git a/_migrations/migration-phases/setup-verification/index.md b/_migrations/migration-phases/setup-verification/index.md deleted file mode 100644 index 5126b7cab6..0000000000 --- a/_migrations/migration-phases/setup-verification/index.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -layout: default -title: Setup verification -nav_order: 70 -has_children: true -parent: Migration phases ---- \ No newline at end of file diff --git a/_migrations/migration-phases/switching-traffic-from-the-source-cluster.md b/_migrations/migration-phases/switching-traffic-from-the-source-cluster.md new file mode 100644 index 0000000000..c0fe834943 --- /dev/null +++ b/_migrations/migration-phases/switching-traffic-from-the-source-cluster.md @@ -0,0 +1,52 @@ +--- +layout: default +title: Switching traffic from the source cluster +nav_order: 110 +parent: Migration phases +--- + +# Switching traffic from the source cluster + +After the source and target clusters are synchronized, traffic needs to be switched to the target cluster so that the source cluster can be taken offline. + +## Assumptions + +This page assumes that the following has occurred before making the switch: + +- All client traffic is being routed through a switchover listener in the [MigrationAssistant Application Load Balancer]({{site.url}}{{site.baseurl}}/migrations/migration-phases/backfill/). +- Client traffic has been verified as compatible with the target cluster. +- The target cluster is in a good state to accept client traffic. +- The target proxy service is deployed. + +## Switching traffic + +Use the following steps to switch traffic from the source cluster to the target cluster: + +1. In the AWS Management Console, navigate to **ECS** > **Migration Assistant Cluster**. Note the desired count of the capture proxy, which should be greater than 1. + +2. Update the **ECS Service** of the target proxy to be at least as large as the traffic capture proxy. Wait for tasks to start up, and verify that all targets are healthy in the target proxy service's **Load balancer health** section. + +3. Navigate to **EC2** > **Load Balancers** > **Migration Assistant ALB**. + +4. Navigate to **ALB Metrics** and examine any useful information, specifically looking at **Active Connection Count** and **New Connection Count**. Note any large discrepancies, which can indicate reused connections affecting traffic switchover. + +5. Navigate to **Capture Proxy Target Group** (`ALBSourceProxy--TG`) > **Monitoring**. + +6. Examine the **Metrics Requests**, **Target (2XX, 3XX, 4XX, 5XX)**, and **Target Response Time** metrics. Verify that this appears as expected and includes all traffic expected to be included in the switchover. Note details that could help identify anomalies during the switchover, including the expected response time and response code rate. + +7. Navigate back to **ALB Metrics** and choose **Target Proxy Target Group** (`ALBTargetProxy--TG`). Verify that all expected targets are healthy and that none are in a draining state. + +8. Navigate back to **ALB Metrics** and to the **Listener** on port `9200`. + +9. Choose the **Default rule** and **Edit**. + +10. Modify the weights of the targets to switch the desired traffic to the target proxy. To perform a full switchover, modify the **Target Proxy** weight to `1` and the **Source Proxy** weight to `0`. + +11. Choose **Save Changes**. + +12. Navigate to both **SourceProxy** and **TargetProxy TG Monitoring** metrics and verify that traffic is switching over as expected. If connections are being reused by clients, perform any necessary actions to terminate them. Monitor these metrics until **SourceProxy TG** shows 0 requests when all clients have switched over. + + +## Fallback + +If you need to fall back to the source cluster at any point during the switchover, revert the **Default rule** so that the Application Load Balancer routes to the **SourceProxy Target Group**. \ No newline at end of file diff --git a/_migrations/migration-phases/using-traffic-replayer.md b/_migrations/migration-phases/using-traffic-replayer.md new file mode 100644 index 0000000000..1c812be211 --- /dev/null +++ b/_migrations/migration-phases/using-traffic-replayer.md @@ -0,0 +1,307 @@ +--- +layout: default +title: Using Traffic Replayer +nav_order: 100 +parent: Migration phases +--- + +# Using Traffic Replayer + +This guide covers how to use Traffic Replayer to replay captured traffic from a source cluster to a target cluster during the migration process. Traffic Replayer allows you to verify that the target cluster can handle requests in the same way as the source cluster and catch up to real-time traffic for a smooth migration. + +## When to run Traffic Replayer + +After deploying Migration Assistant, Traffic Replayer does not run by default. It should be started only after all metadata and documents have been migrated to ensure that recent changes to the source cluster are properly reflected in the target cluster. + +For example, if a document was deleted after a snapshot was taken, starting Traffic Replayer before the document migration is complete may cause the deletion request to execute before the document is added to the target. Running Traffic Replayer after all other migration processes ensures that the target cluster will be consistent with the source cluster. + +## Configuration options + +[Traffic Replayer settings]({{site.url}}{{site.baseurl}}/migrations/deploying-migration-assisstant/configuation-options/) are configured during the deployment of Migration Assistant. Make sure to set the authentication mode for Traffic Replayer so that it can properly communicate with the target cluster. For more information about different types of traffic that are handled by Traffic Replayer, see [limitations](#limitations). + +## Using Traffic Replayer + +To manage Traffic Replayer, use the `console replay` command. The following examples show the available commands. + +### Start Traffic Replayer + +The following command starts Traffic Replayer with the options specified at deployment: + +```bash +console replay start +``` + +When starting Traffic Replayer, you should receive an output similar to the following: + +```bash +root@ip-10-0-2-66:~# console replay start +Replayer started successfully. +Service migration-dev-traffic-replayer-default set to 1 desired count. Currently 0 running and 0 pending. +``` + +## Check the status of Traffic Replayer + +Use the following command to show the status of Traffic Replayer: + +```bash +console replay status +``` + +Replay will return one of the following statuses: + +- `Running` shows how many container instances are actively running. +- `Pending` indicates how many instances are being provisione.d +- `Desired` shows the total number of instances that should be running. + +You should receive an output similar to the following: + +```bash +root@ip-10-0-2-66:~# console replay status +(, 'Running=0\nPending=0\nDesired=0') +``` + +## Stop Traffic Replayer + +The following command stops Traffic Replayer: + +```bash +console replay stop +``` + +You should receive an output similar to the following: + +```bash +root@ip-10-0-2-66:~# console replay stop +Replayer stopped successfully. +Service migration-dev-traffic-replayer-default set to 0 desired count. Currently 0 running and 0 pending. +``` + + + +### Delivery guarantees + +Traffic Replayer retrieves traffic from Kafka and updates its commit cursor after sending requests to the target cluster. This provides an "at least once" delivery guarantee; however, success isn't always guaranteed. Therefore, you should monitor metrics and tuple outputs or perform external validation to ensure that the target cluster is functioning as expected. + +## Time scaling + +Traffic Replayer sends requests in the same order that they were received from each connection to the source. However, relative timing between different connections is not guaranteed. For example: + +- **Scenario**: Two connections exist:one sends a PUT request every minute, and the other sends a GET request every second. +- **Behavior**: Traffic Replayer will maintain the sequence within each connection, but the relative timing between the connections (PUTs and GETs) is not preserved. + +Assume that a source cluster responds to requests (GETs and PUTs) within 100 ms: + +- With a **speedup factor of 1**, the target will experience the same request rates and idle periods as the source. +- With a **speedup factor of 2**, requests will be sent twice as fast, with GETs sent every 500 ms and PUTs every 30 seconds. +- With a **speedup factor of 10**, requests will be sent 10x faster, and as long as the target responds quickly, Traffic Replayer can maintain the pace. + +If the target cannot respond fast enough, Traffic Replayer will wait for the previous request to complete before sending the next one. This may cause delays and affect global relative ordering. + +## Transformations + +During migrations, some requests may need to be transformed between versions. For example, Elasticsearch previously supported multiple type mappings in indexes, but this is no longer the case in OpenSearch. Clients may need to be adjusted accordingly by splitting documents into multiple indexes or transforming request data. + +Traffic Replayer automatically rewrites host and authentication headers, but for more complex transformations, custom transformation rules can be specified using the `--transformer-config` option. For more information, see the [Traffic Replayer README](https://github.com/opensearch-project/opensearch-migrations/blob/c3d25958a44ec2e7505892b4ea30e5fbfad4c71b/TrafficCapture/trafficReplayer/README.md#transformations). + +### Example transformation + +Suppose that a source request contains a `tagToExcise` element that needs to be removed and its children promoted and that the URI path includes `extraThingToRemove`, which should also be removed. The following Jolt script handles this transformation: + +```json +[{ "JsonJoltTransformerProvider": +[ + { + "script": { + "operation": "shift", + "spec": { + "payload": { + "inlinedJsonBody": { + "top": { + "tagToExcise": { + "*": "payload.inlinedJsonBody.top.&" + }, + "*": "payload.inlinedJsonBody.top.&" + }, + "*": "payload.inlinedJsonBody.&" + }, + "*": "payload.&" + }, + "*": "&" + } + } + }, + { + "script": { + "operation": "modify-overwrite-beta", + "spec": { + "URI": "=split('/extraThingToRemove',@(1,&))" + } + } + }, + { + "script": { + "operation": "modify-overwrite-beta", + "spec": { + "URI": "=join('',@(1,&))" + } + } + } +] +}] +``` + +The resulting request sent to the target will appear similar to the following: + +```http +PUT /oldStyleIndex/moreStuff HTTP/1.0 +host: testhostname + +{"top":{"properties":{"field1":{"type":"text"},"field2":{"type":"keyword"}}}} +``` + +You can pass Base64-encoded transformation scripts using `--transformer-config-base64`. + +## Result logs + +HTTP transactions from the source capture and those resent to the target cluster are logged in files located at `/shared-logs-output/traffic-replayer-default/*/tuples/tuples.log`. The `/shared-logs-output` directory is shared across containers, including the migration console. You can access these files from the migration console using the same path. Previous runs are also available in a `gzipped` format. + +Each log entry is a newline-delimited JSON object, containing information about the source and target requests/responses along with other transaction details, such as response times. + +These logs contain the contents of all requests, including authorization headers and the contents of all HTTP messages. Ensure that access to the migration environment is restricted, as these logs serve as a source of truth for determining what happened in both the source and target clusters. Response times for the source refer to the amount of time between the proxy sending the end of a request and receiving the response. While response times for the target are recorded in the same manner, keep in mind that the locations of the capture proxy, Traffic Replayer, and target may differ and that these logs do not account for the client's location. +{: .note} + + +### Example log entry + +The following example log entry shows a `/_cat/indices?v` request sent to both the source and target clusters: + +```json +{ + "sourceRequest": { + "Request-URI": "/_cat/indices?v", + "Method": "GET", + "HTTP-Version": "HTTP/1.1", + "Host": "capture-proxy:9200", + "Authorization": "Basic YWRtaW46YWRtaW4=", + "User-Agent": "curl/8.5.0", + "Accept": "*/*", + "body": "" + }, + "sourceResponse": { + "HTTP-Version": {"keepAliveDefault": true}, + "Status-Code": 200, + "Reason-Phrase": "OK", + "response_time_ms": 59, + "content-type": "text/plain; charset=UTF-8", + "content-length": "214", + "body": "aGVhbHRoIHN0YXR1cyBpbmRleCAgICAgICB..." + }, + "targetRequest": { + "Request-URI": "/_cat/indices?v", + "Method": "GET", + "HTTP-Version": "HTTP/1.1", + "Host": "opensearchtarget", + "Authorization": "Basic YWRtaW46bXlTdHJvbmdQYXNzd29yZDEyMyE=", + "User-Agent": "curl/8.5.0", + "Accept": "*/*", + "body": "" + }, + "targetResponses": [{ + "HTTP-Version": {"keepAliveDefault": true}, + "Status-Code": 200, + "Reason-Phrase": "OK", + "response_time_ms": 721, + "content-type": "text/plain; charset=UTF-8", + "content-length": "484", + "body": "aGVhbHRoIHN0YXR1cyBpbmRleCAgICAgICB..." + }], + "connectionId": "0242acfffe13000a-0000000a-00000005-1eb087a9beb83f3e-a32794b4.0", + "numRequests": 1, + "numErrors": 0 +} +``` + + +### Decoding log content + +The contents of HTTP message bodies are Base64 encoded in order to handle various types of traffic, including compressed data. To view the logs in a more human-readable format, use the console library `tuples show`. Running the script as follows will produce a `readable-tuples.log` in the home directory: + +```shell +console tuples show --in /shared-logs-output/traffic-replayer-default/d3a4b31e1af4/tuples/tuples.log > readable-tuples.log +``` + +The `readable-tuples.log` should appear similar to the following: + +```json +{ + "sourceRequest": { + "Request-URI": "/_cat/indices?v", + "Method": "GET", + "HTTP-Version": "HTTP/1.1", + "Host": "capture-proxy:9200", + "Authorization": "Basic YWRtaW46YWRtaW4=", + "User-Agent": "curl/8.5.0", + "Accept": "*/*", + "body": "" + }, + "sourceResponse": { + "HTTP-Version": {"keepAliveDefault": true}, + "Status-Code": 200, + "Reason-Phrase": "OK", + "response_time_ms": 59, + "content-type": "text/plain; charset=UTF-8", + "content-length": "214", + "body": "health status index uuid ..." + }, + "targetRequest": { + "Request-URI": "/_cat/indices?v", + "Method": "GET", + "HTTP-Version": "HTTP/1.1", + "Host": "opensearchtarget", + "Authorization": "Basic YWRtaW46bXlTdHJvbmdQYXNzd29yZDEyMyE=", + "User-Agent": "curl/8.5.0", + "Accept": "*/*", + "body": "" + }, + "targetResponses": [{ + "HTTP-Version": {"keepAliveDefault": true}, + "Status-Code": 200, + "Reason-Phrase": "OK", + "response_time_ms": 721, + "content-type": "text/plain; charset=UTF-8", + "content-length": "484", + "body": "health status index uuid ..." + }], + "connectionId": "0242acfffe13000a-0000000a-00000005-1eb087a9beb83f3e-a32794b4.0", + "numRequests": 1, + "numErrors": 0 +} +``` + + +## Metrics + +Traffic Replayer emits various OpenTelemetry metrics to Amazon CloudWatch, and traces are sent through AWS X-Ray. The following are some useful metrics that can help evaluate cluster performance. + +### `sourceStatusCode` + +This metric tracks the HTTP status codes for both the source and target clusters, with dimensions for the HTTP verb, such as `GET` or `POST`, and the status code families (200--299). These dimensions can help quickly identify discrepancies between the source and target, such as when `DELETE 200s` becomes `4xx` or `GET 4xx` errors turn into `5xx` errors. + +### `lagBetweenSourceAndTargetRequests` + +This metric shows the delay between requests hitting the source and target clusters. With a speedup factor greater than 1 and a target cluster that can handle requests efficiently, this value should decrease as the replay progresses, indicating a reduction in replay lag. + +### Additional metrics + +The following metrics are also reported: + +- **Throughput**: `bytesWrittenToTarget` and `bytesReadFromTarget` indicate the throughput to and from the cluster. +- **Retries**: `numRetriedRequests` tracks the number of requests retried due to status code mismatches between the source and target. +- **Event counts**: Various `(*)Count` metrics track the number of completed events. +- **Durations**: `(*)Duration` metrics measure the duration of each step in the process. +- **Exceptions**: `(*)ExceptionCount` shows the number of exceptions encountered during each processing phase. + + +## CloudWatch considerations + +Metrics pushed to CloudWatch may experience a visibility lag of around 5 minutes. CloudWatch also retains higher-resolution data for a shorter period than lower-resolution data. For more information, see [Amazon CloudWatch concepts](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html). \ No newline at end of file diff --git a/_migrations/migration-phases/verifying-migration-tools.md b/_migrations/migration-phases/verifying-migration-tools.md new file mode 100644 index 0000000000..498ed50feb --- /dev/null +++ b/_migrations/migration-phases/verifying-migration-tools.md @@ -0,0 +1,196 @@ +--- +layout: default +title: Verifying migration tools +nav_order: 70 +parent: Migration phases +--- + +# Verifying migration tools + +Before using the Migration Assistant, take the following steps to verify that your cluster is ready for migration. + +## Snapshot creation verification + +Verify that a snapshot can be created of your source cluster and used for metadata and backfill scenarios. + +### Installing the Elasticsearch S3 Repository plugin + +The snapshot needs to be stored in a location that Migration Assistant can access. This guide uses Amazon Simple Storage Service (Amazon S3). By default, Migration Assistant creates an S3 bucket for storage. Therefore, it is necessary to install the [Elasticsearch S3 repository plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-s3.html) on your source nodes (https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-s3.html). + +Additionally, make sure that the plugin has been configured with AWS credentials that allow it to read and write to Amazon S3. If your Elasticsearch cluster is running on Amazon Elastic Compute Cloud (Amazon EC2) or Amazon Elastic Container Service (Amazon ECS) instances with an AWS Identity and Access Management (IAM) execution role, include the necessary S3 permissions. Alternatively, you can store the credentials in the [Elasticsearch keystore](https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-s3-client.html). + +### Verifying the S3 repository plugin configuration + +You can verify that the S3 repository plugin is configured correctly by creating a test snapshot. + +Create an S3 bucket for the snapshot using the following AWS Command Line Interface (AWS CLI) command: + +```shell +aws s3api create-bucket --bucket --region +``` + +Register a new S3 snapshot repository on your source cluster using the following cURL command: + +```shell +curl -X PUT "http://:9200/_snapshot/test_s3_repository" -H "Content-Type: application/json" -d '{ + "type": "s3", + "settings": { + "bucket": "", + "region": "" + } +}' +``` + +Next, create a test snapshot that captures only the cluster's metadata: + +```shell +curl -X PUT "http://:9200/_snapshot/test_s3_repository/test_snapshot_1" -H "Content-Type: application/json" -d '{ + "indices": "", + "ignore_unavailable": true, + "include_global_state": true +}' +``` + +Check the AWS Management Console to confirm that your bucket contains the snapshot. + +### Removing test snapshots after verification + +To remove the resources created during verification, you can use the following deletion commands: + +**Test snapshot** + +```shell +curl -X DELETE "http://:9200/_snapshot/test_s3_repository/test_snapshot_1?pretty" +``` + +**Test snapshot repository** + +```shell +curl -X DELETE "http://:9200/_snapshot/test_s3_repository?pretty" +``` + +**S3 bucket** + +```shell +aws s3 rm s3:// --recursive +aws s3api delete-bucket --bucket --region +``` + +### Troubleshooting + +Use this guidance to troubleshoot any of the following snapshot verification issues. + +#### Access denied error (403) + +If you encounter an error like `AccessDenied (Service: Amazon S3; Status Code: 403)`, verify the following: + +- The IAM role assigned to your Elasticsearch cluster has the necessary S3 permissions. +- The bucket name and AWS Region provided in the snapshot configuration match the actual S3 bucket you created. + +#### Older versions of Elasticsearch + +Older versions of the Elasticsearch S3 repository plugin may have trouble reading IAM role credentials embedded in Amazon EC2 and Amazon ECS instances. This is because the copy of the AWS SDK shipped with them is too old to read the new standard way of retrieving those credentials, as shown in [the Instance Metadata Service v2 (IMDSv2) specification](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html). This can result in snapshot creation failures, with an error message similar to the following: + +```json +{"error":{"root_cause":[{"type":"repository_verification_exception","reason":"[migration_assistant_repo] path [rfs-snapshot-repo] is not accessible on master node"}],"type":"repository_verification_exception","reason":"[migration_assistant_repo] path [rfs-snapshot-repo] is not accessible on master node","caused_by":{"type":"i_o_exception","reason":"Unable to upload object [rfs-snapshot-repo/tests-s8TvZ3CcRoO8bvyXcyV2Yg/master.dat] using a single upload","caused_by":{"type":"amazon_service_exception","reason":"Unauthorized (Service: null; Status Code: 401; Error Code: null; Request ID: null)"}}},"status":500} +``` + +If you encounter this issue, you can resolve it by temporarily enabling IMDSv1 on the instances in your source cluster for the duration of the snapshot. There is a toggle for this available in the AWS Management Console as well as in the AWS CLI. Switching this toggle will turn on the older access model and enable the Elasticsearch S3 repository plugin to work as normal. For more information about IMDSv1, see [Modify instance metadata options for existing instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-IMDS-existing-instances.html). + +## Switching over client traffic + +The Migration Assistant Application Load Balancer is deployed with a listener that shifts traffic between the source and target clusters through proxy services. The Application Load Balancer should start in **Source Passthrough** mode. + +### Verifying that the traffic switchover is complete + +Use the following steps to verify that the traffic switchover is complete: + +1. In the AWS Management Console, navigate to **EC2 > Load Balancers**. +2. Select the **MigrationAssistant ALB**. +3. Examine the listener on port `9200` and verify that 100% of the traffic is directed to the **Source Proxy**. +4. Navigate to the **Migration ECS Cluster** in the AWS Management Console. +5. Select the **Target Proxy Service**. +6. Verify that the desired count for the service is running: + * If the desired count is not met, update the service to increase it to at least 1 and wait for the service to start. +7. On the **Health and Metrics** tab under **Load balancer health**, verify that all targets are reporting as healthy: + * This confirms that the Application Load Balancer can connect to the target cluster through the target proxy. +8. (Reset) Update the desired count for the **Target Proxy Service** back to its original value in Amazon ECS. + +### Fixing unidentified traffic patterns + +When switching over traffic to the target cluster, you might encounter unidentified traffic patterns. To help identify the cause of these patterns, use the following steps: +* Verify that the target cluster allows traffic ingress from the **Target Proxy Security Group**. +* Navigate to **Target Proxy ECS Tasks** to investigate any failing tasks. +Set the **Filter desired status** to **Any desired status** to view all tasks, then navigate to the logs for any stopped tasks. + + +## Verifying replication + +Use the following steps to verify that replication is working once the traffic capture proxy is deployed: + + +1. Navigate to the **Migration ECS Cluster** in the AWS Management Console. +2. Navigate to **Capture Proxy Service**. +3. Verify that the capture proxy is running with the desired proxy count. If it is not, update the service to increase it to at least 1 and wait for startup. +4. Under **Health and Metrics** > **Load balancer health**, verify that all targets are healthy. This means that the Application Load Balancer is able to connect to the source cluster through the capture proxy. +5. Navigate to the **Migration Console Terminal**. +6. Run `console kafka describe-topic-records`. Wait 30 seconds for another Application Load Balancer health check. +7. Run `console kafka describe-topic-records` again and verify that the number of RECORDS increased between runs. +8. Run `console replay start` to start Traffic Replayer. +9. Run `tail -f /shared-logs-output/traffic-replayer-default/*/tuples/tuples.log | jq '.targetResponses[]."Status-Code"'` to confirm that the Kafka requests were sent to the target and that it responded as expected. If the responses don't appear: + * Check that the migration console can access the target cluster by running `./catIndices.sh`, which should show the indexes in the source and target. + * Confirm that messages are still being recorded to Kafka. + * Check for errors in the Traffic Replayer logs (`/migration/STAGE/default/traffic-replayer-default`) using CloudWatch. +10. (Reset) Update the desired count for the **Capture Proxy Service** back to its original value in Amazon ECS. + +### Troubleshooting + +Use this guidance to troubleshoot any of the following replication verification issues. + +### Health check responses with 401/403 status code + +If the source cluster is configured to require authentication, the capture proxy will not be able to verify replication beyond receiving a 401/403 status code for Application Load Balancer health checks. For more information, see [Failure Modes](https://github.com/opensearch-project/opensearch-migrations/blob/main/TrafficCapture/trafficCaptureProxyServer/README.md#failure-modes). + +### Traffic does not reach the source cluster + +Verify that the source cluster allows traffic ingress from the Capture Proxy Security Group. + +Look for failing tasks by navigating to **Traffic Capture Proxy ECS**. Change **Filter desired status** to **Any desired status** in order to see all tasks and navigate to the logs for stopped tasks. + + +## Resetting before migration + +After all verifications are complete, reset all resources before using Migration Assistant for an actual migration. + +The following steps outline how to reset resources with Migration Assistant before executing the actual migration. At this point all verifications are expected to have been completed. These steps can be performed after [Accessing the Migration Console]({{site.url}}{{site.baseurl}}/migrations/migration-console/accessing-the-migration-console/). + +### Traffic Replayer + +To stop running Traffic Replayer, use the following command: + +```bash +console replay stop +``` + +### Kafka + +To clear all captured traffic from the Kafka topic, you can run the following command. + +This command will result in the loss of any traffic data captured by the capture proxy up to this point and thus should be used with caution. +{: .warning} + +```bash +console kafka delete-topic +``` + +### Target cluster + +To clear non-system indexes from the target cluster that may have been created as a result of testing, you can run the following command: + +This command will result in the loss of all data in the target cluster and should be used with caution. +{: .warning} + +```bash +console clusters clear-indices --cluster target +``` +