Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New serverless pattern - EventBridge-Bedrock-S3-AOSS #2381

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
260f396
Initial Version
rajavaid77 Jul 15, 2024
b6df0b2
removed dependencies for layer
rajavaid77 Jul 15, 2024
0e72568
added architecture diagram
rajavaid77 Jul 15, 2024
36ef8b6
renamed project
rajavaid77 Jul 15, 2024
6dec047
Merge branch 'aws-samples:main' into rajavaid-feature-eventbridge-bed…
rajavaid77 Jul 15, 2024
da625b8
Added comments to index_creation.py, updated README.md
rajavaid77 Jul 15, 2024
f73f373
updated README.md
rajavaid77 Jul 15, 2024
57c1030
updated architecture diagram
rajavaid77 Jul 15, 2024
d7568a9
updated example-pattern.json and README.md
rajavaid77 Jul 15, 2024
2cd0d23
updated README.md, fixed path typo
rajavaid77 Jul 15, 2024
1c2e333
updated README.md, fixed path typo
rajavaid77 Jul 15, 2024
c6bd96d
updated README.md, fixed path typo
rajavaid77 Jul 15, 2024
c79b080
minor updates to README.md
rajavaid77 Jul 15, 2024
d8e98e3
updated README.md and fix issue in app.py
rajavaid77 Jul 15, 2024
98b32c2
Updates to README.md, added output for kb id and ds id
rajavaid77 Jul 15, 2024
48f170c
updated example_pattern.json to fix issues
rajavaid77 Jul 18, 2024
0412138
removed unneccessary file
rajavaid77 Jul 18, 2024
bd1dd77
removed unneccessary file
rajavaid77 Jul 18, 2024
dfe003e
removed unneccessary file
rajavaid77 Jul 18, 2024
d6836c1
updated to fix stack issues
rajavaid77 Jul 18, 2024
a532873
updated to fix stack issues
rajavaid77 Jul 18, 2024
d17b2af
Merge branch 'aws-samples:main' into rajavaid-feature-eventbridge-bed…
rajavaid77 Jul 18, 2024
3dce19a
Updates to Readme
rajavaid77 Jul 18, 2024
b35c787
Updates to Readme
rajavaid77 Jul 18, 2024
75e1cd6
remove unwanted tracket files
rajavaid77 Jul 18, 2024
2ae3fbb
README.md updates and cleanup unwanted files
rajavaid77 Jul 19, 2024
6a11e11
README.md updates
rajavaid77 Jul 19, 2024
e898b50
merge in the logging stack with kb stack
rajavaid77 Jul 19, 2024
92b695d
updated testing instruction in README.md
rajavaid77 Jul 19, 2024
e177402
fixed app.py
rajavaid77 Jul 19, 2024
d3841c7
fixed ingestionstack
rajavaid77 Jul 19, 2024
5e721e8
fixed ingestionstack
rajavaid77 Jul 19, 2024
8f9f753
fixed stack name in README
rajavaid77 Jul 19, 2024
223363b
fixed stack name in README
rajavaid77 Jul 19, 2024
d6033e0
fixed layers
rajavaid77 Jul 26, 2024
7eb1104
Merge branch 'aws-samples:main' into rajavaid-feature-eventbridge-bed…
rajavaid77 Jul 27, 2024
d1ea582
Now adding resource policy to loggroup to prevent hitting resource po…
rajavaid77 Jul 27, 2024
aa7dab6
fixed instructions
rajavaid77 Jul 29, 2024
68be9cc
README changes
rajavaid77 Jul 29, 2024
7ab0fa1
README changes
rajavaid77 Jul 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions eventbridge-bedrock-s3-aoss/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
*.swp
package-lock.json
__pycache__
.pytest_cache
.venv
*.egg-info

# CDK asset staging directory
.cdk.staging
cdk.out
196 changes: 196 additions & 0 deletions eventbridge-bedrock-s3-aoss/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# EventBridge to Amazon Bedrock to Amazon OpenSearch Serverless
![architecture](architecture/architecture.png)

This pattern demonstrates an approach to automatically sync datasource associated with [Knowledge Bases for Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/). Knowledge bases for Amazon Bedrock help you take advantage of [Retrieval Augmented Generation](https://aws.amazon.com/what-is/retrieval-augmented-generation/) (RAG), a popular technique that involves drawing information from a data store to augment the responses generated by Large Language Models (LLMs). When you set up a knowledge base with your data sources, your application can query the knowledge base to return information to answer the query either with direct quotations from sources or with natural responses generated from the query results.

After you create your knowledge base, you ingest your data source/sources into your knowledge base so that they're indexed and are able to be queried. Additionally each time you add, modify, or remove files from your data source, you must sync the data source so that it is re-indexed to the knowledge base. Syncing is incremental, so Amazon Bedrock only processes added, modified, or deleted documents since the last sync.

At the time of writing, Knowledge Bases for Amazon Bedrock doesn't have a feature to automatically periodically sync the datasource associated with a Knowledge Base. So customers who need to refresh their datasources periodically to ensure their knowledge base is up-to-date have to rely on bespoke solution. This pattern shows one way of implementing the solution, using [Amazon EventBridge Scheduler](https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html).

EventBridge Scheduler simplifies scheduling tasks by providing a centralized, serverless service that reliably executes schedules and invokes targets across various AWS services. In this particular pattern, we configure an EventBridge schedule that runs periodically (using a schedule expression). As part of the EventBridge schedule creation, we configure a target. A target is an API operation that EventBridge Scheduler invokes on your behalf whenever the schedule runs. In our case the target API would be the [`StartIngestionJob`](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) API operation on the Amazon Bedrock Agents service.

Learn more about this pattern at Serverless Land Patterns: https://serverlessland.com/patterns/eventbridge-bedrock-s3-aoss

> [!Important]
>This application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.

## Requirements

* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
* [Node and NPM](https://nodejs.org/en/download/) installed
* [AWS Cloud Development Kit](https://docs.aws.amazon.com/cdk/latest/guide/cli.html) (AWS CDK) installed

> [!Important]
> This pattern uses Knowledge Bases for Amazon Bedrock and the Amazon Titan Text Embeddings V2. See [Supported regions and models for Knowledge bases for Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-supported.html) to select a region where Knowledge bases for Amazon Bedrock is supported

## Enable Model Access on Amazon Bedrock
Knowledge bases for Amazon Bedrock use a foundation model to embed your data sources in a vector store. Before creating a knowledge base and selecting an embeddings model for the Knowledge Base, You must request access to the model. If you try to use the model (with the API or console) before you have requested access to it, you receive an error message. For more information, see [Model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html).

1. In the AWS console, select the region from which you want to access Amazon Bedrock.

![Region Selection](images/region-selection.png)

1. Find **Amazon Bedrock** by searching in the AWS console.

![Bedrock Search](images/bedrock-search.png)

1. Expand the side menu.

![Bedrock Expand Menu](images/bedrock-menu-expand.png)

1. From the side menu, select **Model access**.

![Model Access](images/model-access-link.png)

1. Depending on your view, Select the **Enable specific models** button or the **Modify Model Access** button

![Model Access View](images/model-access-view.png)


6. Use the checkboxes to select the models you wish to enable. Review the applicable EULAs as needed. Click **Next** to go to the Review screen and then **Submit** to enable the required models in your account. For this pattern, by default, we would only need Titan Text Embeddings V2 / model id: _amazon.titan-embed-text-v2:0_.

## Deployment Instructions

1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
```
git clone https://github.com/aws-samples/serverless-patterns
```
1. Change directory to the pattern directory:
```
cd serverless-patterns/eventbridge-bedrock-s3-aoss
```
1. Create virtual environment for Python
```
python3 -m venv .venv
```
1. Activate the virtualenv like this:

```
source .venv/bin/activate
```
1. Install the Python required dependencies:
```
pip install -r requirements.txt
```
1. Install dependencies to be used in Lambda Layer

```
pip install --target layers/python -r layers/requirements.txt
```

1. Run the command below to bootstrap your account. CDK needs it to deploy
```
cdk bootstrap
```
1. see the list of the IDs of the stacks in the AWS CDK application:
```
cdk list
```

1. Review the CloudFormation template CDK generates for the included stacks using the following AWS CDK CLI command:


> [!NOTE]
> Substitute the stack_id with one from the list in output from the `cdk list` command
```
cdk synth <stack_id>
```

10. From the command line, use AWS CDK to deploy the AWS resources.

```
cdk deploy --all
Copy link
Contributor

@pputhran pputhran Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running into an error on deployment.
BedrockServiceRoleAccessPolicyStack and OpenSearchServerlessStack got deployed successfully.

BedrockKnowledgebaseStack is failing with the folliwing error

KBDataSourceS3Bucket

Resource handler returned message: "bedrock-rag-jcrob6 already exists (Service: S3, Status Code: 0, Request ID: null)" (RequestToken: f84ae2ac-9ac4-ad30-44c1-4a2c0a9bb599, HandlerErrorCode: AlreadyExists)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed the errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately ran into another error with the new changes with BedrockKBStack

Resource handler returned message: "Supplied Policy document is breaching Cloudwatch Logs policy length limit. (Service: CloudWatchLogs, Status Code: 400, Request ID: 39ff2d15-9eca-423f-bf0c-bf61d8d5bd4e)" (RequestToken: 972f5e1f-da40-7437-4d9b-5d6c5f5fea2e, HandlerErrorCode: AccessDenied)

```
Enter `y` if prompted `Do you wish to deploy these changes (y/n)?`

> [!NOTE]
> You can optionally change the `collection_name`, `index_name`, `knowledge_base_name`, `kb_s3_datasource_name`
parameters in the `cdk.context.json`. The parameters are used to name the OpenSearch Serverless collection, index, the Amazon Bedrock Knowledge base and the associated S3 data source, respectively.

## How it works
Upon deployment, the CDK stack will create a Knowledge Base for Bedrock configure with S3 Bucket as data source and an OpenSearch Serverless collection to store vector data. A data source repository contains files or content with information that can be retrieved when your knowledge base is queried. The stack also include an EventBridge scheduler that is configured to run every 5 mins and invoke the `StartIngestionJob` API on Amazon Bedrock Agents service. Amazon Bedrock supports a monitoring system to help you understand the execution of any data ingestion jobs. The Stack would create the neccessary CloudWatch log groups and CloudWatch delivery. You can gain visibility into the ingestion of your knowledge base resources with this logging system. Additionally, Amazon Bedrock is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in Amazon Bedrock. CloudTrail captures all API calls for Amazon Bedrock as events.


## Testing

### Verify Event Scheduler is ENABLED
The EventScheduler should be enabled by default when the stack creation is complete. You can verify this by running the below command. The expected output of the command is the text `ENABLED`. This means that the scheduler is enabled and is ready to run at the next schedule time.

```
aws scheduler get-schedule --name BedrockKBDataSourceSyncSchedule --group BedrockKBSyncScheduleGroup --query 'State' --output text
```
### Upload Document(s) to S3 Bucket
Upload a sample pdf document to S3 bucket that is configured as the KB Datasource. You can provide your own or use one of the pdfs provided in ```examples``` folder. You can find the bucketname in the Outputs section of the CDK command output of the BedrockKBStack
> [!NOTE]
> Substitute the value from `BedrockKBStack.bucketname` found in the Outputs section of the `cdk deploy` command output of the `BedrockKBStack`

```
aws s3 cp examples/2022-Shareholder-Letter.pdf s3://<BedrockKBStack.bucketname>
```


> [!Important]
> Wait for for the next scheduled run before running the below commands. By default, this stack configures a scheduler to run every 5 minutes. You can find the scheduler rate by running the below command. The expected output is `rate(5 minutes)`
```
aws scheduler get-schedule --name BedrockKBDataSourceSyncSchedule --group BedrockKBSyncScheduleGroup --query 'ScheduleExpression' --output text
```

### View CloudTrail log for StartIngestionJob
1. In the CloudTrail console, click on Event history. Event history provides a viewable, searchable, downloadable, and immutable record of the past 90 days of management events.
![CloudTrail Event History](images/cloudtrail-eventhistory.png)

1. Filter using the Event Name as StartIngestionJob as well as by date and time (for example, Last 20 minutes)
![StartIngestionJob Event](images/startingestionjob-event.png)

1. In the Event Record, notice that the `sessionContext.sessionIssuer.userName` mentions `EventBridgeSchedulerRole` which is the role that was created by the CDK stack, and assigned to the EventBridge Schedule. Also the `userAgent` indicates `AmazonEventBridgeScheduler` as the agent through which the request was made.

### Tail the CloudWatch Logs to look for Sync Events
The CDK creates resources to enable logging for an Amazon Bedrock knowledge base using the CloudWatch contructs.
See [Knowledge bases logging](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-bases-logging.html) for more information.
The following command tails the CloudWatch log to view KnowledgeBase events as they are logged.

> [!NOTE]
> Substitute the `BedrockKBStack.knowledgebaseid` found in the CDK Output section of the `cdk deploy` command output of the `BedrockKBStack`

```
aws logs tail --follow --since 20m BedrockKnowledgeBase-`<BedrockKBStack.knowledgebaseid>`
```

The command should output cloudwatch log entries, for the various stages of the ingestion process (such as INGESTION_JOB_STARTED, CRAWLING_COMPLETED, EMBEDDING_STARTED and so on). The final log statement for a given ingestion job id should be the entry to indicate the COMPLETED status of the job as in the screenshot below. The log entry also outputs the resource stats include the number of documents ingested to the Knowledge Base.

Sample Output

![cloudwatch-log](images/cloudwatch-log.png)

### View Ingestion Job timestamp and status
You can also use the following command to check the status of ingestion job(s). The command outputs the most recent ingestion job.

> [!NOTE]
> Substitute the BedrockKBStack.knowledgebaseid and BedrockKBStack.datasourceid found in the Output section of the `cdk deploy` command output of the `BedrockKBStack`

```
aws bedrock-agent list-ingestion-jobs --knowledge-base-id <BedrockKBStack.knowledgebaseid> --data-source-id <BedrockKBStack.datasourceid> --query 'reverse(sort_by(ingestionJobSummaries,&startedAt))[:1].{startedAt:startedAt, updatedAt:updatedAt,ingestionJobId:ingestionJobId,status:status}'
```
Sample Output

![list-ingestion-jobs-output](images/list-ingestion-jobs-output.png)

## Cleanup

1. Run below script in the `eventbridge-bedrock-s3-aoss` directory to delete AWS resources created by this sample stack.
```bash
cdk destroy --all
```

## Extra Resources
* [Bedrock Api Reference](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html)
* [Sync to ingest your data sources into the knowledge base](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ingest.html)
* [What is Amazon EventBridge Scheduler?](https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html)
* [Using universal targets with EventBridge Scheduler](https://docs.aws.amazon.com/scheduler/latest/UserGuide/managing-targets-universal.html)

----
Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.

SPDX-License-Identifier: MIT-0
32 changes: 32 additions & 0 deletions eventbridge-bedrock-s3-aoss/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env python3
import os
import aws_cdk as cdk
from stacks.bedrock_knowledgebase_stack import BedrockKnowledgebaseStack
from stacks.opensearch_serverless_stack import OpenSearchServerlessStack
from stacks.ingestion_job_resources_stack import IngestionJobResourcesStack
from stacks.bedrock_service_role_stack import BedrockServiceRoleStack


app = cdk.App()

bedrock_sr_ap_stack = BedrockServiceRoleStack(app,
"BedrockServiceRoleStack",
)

opensearch_serverless_stack = OpenSearchServerlessStack(app, "AOSSStack",
bedrock_kb_service_role_arn = bedrock_sr_ap_stack.bedrock_kb_service_role_arn
)

bedrock_kb_stack = BedrockKnowledgebaseStack(app,
"BedrockKBStack",
cfn_aoss_collection_arn = opensearch_serverless_stack.cfn_aoss_collection_arn,
index_name = opensearch_serverless_stack.index_name,
bedrock_kb_service_role_arn = bedrock_sr_ap_stack.bedrock_kb_service_role_arn
)
ingestion_job_resources_stack = IngestionJobResourcesStack(app,
"SchedulerStack",
knowledge_base_id=bedrock_kb_stack.knowledge_base_id,
data_source_id=bedrock_kb_stack.knowledgebase_datasource_id
)

app.synth()
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 16 additions & 0 deletions eventbridge-bedrock-s3-aoss/cdk.context.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"opensearch_serverless_params": {
"collection_name": "bedrock-kb",
"index_name": "bedrock-kb-index"
},
"bedrock_knowledgebase_params": {
"knowledge_base_name": "rag-knowledge-base",
"kb_s3_datasource_name":"kb-s3-datasource",
"embedding_model_id": "amazon.titan-embed-text-v2:0",
"vector_index_metadata_field":"text-metadata",
"vector_index_text_field":"text",
"vector_index_vector_field":"vector",
"kb_cw_log_group_name_prefix":"BedrockKnowledgeBase",
"bedrock_kb_log_delivery_source":"bedrock_kb_log_delivery_source"
}
}
52 changes: 52 additions & 0 deletions eventbridge-bedrock-s3-aoss/cdk.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{
"app": "python3 app.py",
"watch": {
"include": [
"**"
],
"exclude": [
"README.md",
"cdk*.json",
"requirements*.txt",
"source.bat",
"**/__init__.py",
"python/__pycache__",
"tests"
]
},
"context": {
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
"@aws-cdk/core:checkSecretUsage": true,
"@aws-cdk/core:target-partitions": [
"aws",
"aws-cn"
],
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
"@aws-cdk/aws-iam:minimizePolicies": true,
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
"@aws-cdk/core:enablePartitionLiterals": true,
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
"@aws-cdk/aws-iam:standardizedServicePrincipals": true,
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true,
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true,
"@aws-cdk/aws-route53-patters:useCertificate": true,
"@aws-cdk/customresources:installLatestAwsSdkDefault": false,
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true,
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true,
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true,
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true,
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true,
"@aws-cdk/aws-redshift:columnId": true,
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true,
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true,
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true,
"@aws-cdk/aws-kms:aliasNameRef": true
}
}
Loading