Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for Azure Deployment #741

Merged
merged 28 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions KernelMemory.sln
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "docs", "docs", "{7BA7F1B2-1
docs\service.md = docs\service.md
docs\_config.local.yml = docs\_config.local.yml
docs\_config.yml = docs\_config.yml
docs\azure.md = docs\azure.md
EndProjectSection
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "examples", "examples", "{0A43C65C-6007-4BB4-B3FE-8D439FC91841}"
Expand Down Expand Up @@ -294,6 +295,28 @@ Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "210-KM-without-builder", "e
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "S3", "extensions\AWS\S3\S3.csproj", "{5A14582B-C6D0-459E-BBB8-EA46CE8DC52E}"
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "azure", "azure", "{795CD089-05A9-4800-B6FF-3243CAD7D41B}"
ProjectSection(SolutionItems) = preProject
docs\azure\architecture.md = docs\azure\architecture.md
docs\azure\architecture.png = docs\azure\architecture.png
docs\azure\async.png = docs\azure\async.png
docs\azure\deployment.md = docs\azure\deployment.md
docs\azure\diagram.png = docs\azure\diagram.png
docs\azure\usage.md = docs\azure\usage.md
EndProjectSection
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "how-to", "how-to", "{6B992EFC-81B0-4E52-925F-41420BDC40B6}"
ProjectSection(SolutionItems) = preProject
docs\how-to\custom-partitioning.md = docs\how-to\custom-partitioning.md
docs\how-to\custom-pipelines.md = docs\how-to\custom-pipelines.md
docs\how-to\custom-prompts.md = docs\how-to\custom-prompts.md
docs\how-to\hugging-face.md = docs\how-to\hugging-face.md
docs\how-to\intent-detection.md = docs\how-to\intent-detection.md
docs\how-to\multitenancy.md = docs\how-to\multitenancy.md
EndProjectSection
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "211-dotnet-WebClient-Intent-Detection", "examples\211-dotnet-WebClient-Intent-Detection\211-dotnet-WebClient-Intent-Detection.csproj", "{84AEC1DD-CBAE-400A-949C-91BA373C587D}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand Down Expand Up @@ -552,6 +575,9 @@ Global
{5A14582B-C6D0-459E-BBB8-EA46CE8DC52E}.Debug|Any CPU.Build.0 = Debug|Any CPU
{5A14582B-C6D0-459E-BBB8-EA46CE8DC52E}.Release|Any CPU.ActiveCfg = Release|Any CPU
{5A14582B-C6D0-459E-BBB8-EA46CE8DC52E}.Release|Any CPU.Build.0 = Release|Any CPU
{84AEC1DD-CBAE-400A-949C-91BA373C587D}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{84AEC1DD-CBAE-400A-949C-91BA373C587D}.Debug|Any CPU.Build.0 = Debug|Any CPU
{84AEC1DD-CBAE-400A-949C-91BA373C587D}.Release|Any CPU.ActiveCfg = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
Expand Down Expand Up @@ -641,6 +667,9 @@ Global
{06A507C7-46B9-4D36-B88B-B4E4A0E8C0AC} = {0A43C65C-6007-4BB4-B3FE-8D439FC91841}
{00A3DDF3-2230-4AEC-8B5B-B75F958D194B} = {0A43C65C-6007-4BB4-B3FE-8D439FC91841}
{5A14582B-C6D0-459E-BBB8-EA46CE8DC52E} = {155DA079-E267-49AF-973A-D1D44681970F}
{795CD089-05A9-4800-B6FF-3243CAD7D41B} = {7BA7F1B2-19E2-46EB-B000-513EE2F65769}
{6B992EFC-81B0-4E52-925F-41420BDC40B6} = {7BA7F1B2-19E2-46EB-B000-513EE2F65769}
{84AEC1DD-CBAE-400A-949C-91BA373C587D} = {0A43C65C-6007-4BB4-B3FE-8D439FC91841}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {CC136C62-115C-41D1-B414-F9473EFF6EA8}
Expand Down
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,16 @@ Designed for seamless integration as a Plugin with
Copilot and ChatGPT, Kernel Memory enhances data-driven features in applications
built for most popular AI platforms.

# Deployment to Azure

Kernel Memory can be deployed in various configurations, including as a **Service** in Azure.
To learn more about deploying Kernel Memory in Azure, please refer to the
[Azure deployment guide](https://microsoft.github.io/kernel-memory/azure).
For detailed instructions on deploying to Azure, you can check the [infrastructure documentation](/infra/README.md).
If you are already familiar with these resources, you can quickly deploy by clicking the following button.

[![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://aka.ms/KernelMemoryDeploy2Azure)

# Synchronous Memory API (aka "serverless")

Kernel Memory works and scales at best when running as an asynchronous **Web Service**, allowing to
Expand Down
19 changes: 19 additions & 0 deletions docs/azure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
nav_order: 2
has_children: true
title: Kernel Memory in Azure
permalink: /azure
layout: default
---

# Kernel Memory in Azure

{: .note }
Kernel Memory is a flexible solution that can be deployed with different configurations.
This section provides instructions on how to deploy Kernel Memory in Azure.

Kernel Memory relies on various services to store data and run the ingestion pipeline.
The [Extensions](./kernel-memory/extensions) page explains how to enhance Kernel Memory with additional services.

In the following pages, we will discuss how to deploy Kernel Memory in Azure, how to configure it, how to use it, and the associated costs.
While a predefined architecture is covered, Kernel Memory can be customized to meet your specific requirements.
134 changes: 134 additions & 0 deletions docs/azure/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
nav_order: 1
parent: Kernel Memory in Azure
title: Architecture
permalink: /azure/architecture
layout: default
---

# Azure Infrastacture Architecture

This section explains the infrastructure of Kernel Memory in Azure, which implements an asynchronous microservices architecture.

![image](./async.png)

## Components of the Kernel Memory Architecture

The diagram illustrates the components of the Kernel Memory infrastructure in Azure and their interactions.

#### Kernel Memory Web Service

- **Function**: Acts as the entry point for data ingestion. It receives data through an API and processes user queries.
- **Operations**:
- **Upload**: Sends data to Document Storage.
- **Enqueue**: Sends data to the Ingestion Queue.
- **Search**: Queries data from the Memory Database.

#### Document Storage

- **Service**: Azure Blob Storage.
- **Function**: Stores the raw documents uploaded by the Web Service.
- **Operations**:
- **Read/Write**: Kernel Memory Async Handlers read from and write to Document Storage as they process the data.

#### Ingestion Queue

- **Service**: Azure Storage Queue.
- **Function**: Manages the flow of data between the Web Service and the Async Handlers. It temporarily holds the data before processing.
- **Operations**:
- **Enqueue**: Web Service adds data to the queue.
- **Receive**: Kernel Memory Async Handlers retrieve data from the queue for processing.

#### Kernel Memory Async Handlers

- **Function**: Process the data retrieved from the Ingestion Queue. This includes extracting information and transforming it as needed.
- **Operations**:
- **Read/Write**: Access Document Storage for reading raw documents and writing processed data.
- **Upsert/Delete**: Update or delete entries in the Memory Database based on the processed data.

#### Memory Database (Memory Db)

- **Service**: Azure AI Search.
- **Function**: Stores the processed data, making it searchable and accessible for queries.
- **Operations**:
- **Upsert/Delete**: Kernel Memory Async Handlers update or remove data entries.
- **Search**: Web Service queries the database to retrieve processed information for user requests.

### Overview

The Web Service receives data through an API and stores it in Azure Blob Storage.
Kernel Memory Async Handlers then process the data, which is subsequently stored in Azure AI Search.
Finally, the data can be queried by the Web Service

For the Kernel Memory Web Service and Kernel Memory Async Handlers, we use Azure Container Apps to pull Docker images from Docker Hub.
The current architecture employs the `Consumption` ACA deployment type.
For production deployments, we recommend using the `Dedicated` deployment type.
Azure Container Apps are deployed with a public endpoint protected by an API key.

{: .highlight }
It's important to note that the Kernel Memory Web Service and Kernel Memory Async Handlers can also be deployed using
Azure App Service, Azure Kubernetes Service, Azure Container Instances, or Azure Virtual Machines.

Kernel Memory Async Handlers use Azure AI Document Intelligence to extract content from images. Local authentication is disabled in favor of Managed Identity.

An Azure Managed Identity is created and assigned to the Azure Container Apps to access Azure Blob Storage and Azure AI Search.
This Managed Identity is also used to access other resources required by the Kernel Memory Web Service and Kernel Memory Async Handlers.

Azure Blob Storage is used for document storage.
Data is stored in Blob Storage and, after processing by the Kernel Memory Async Handlers, is stored in Azure AI Search for querying by the Web Service.
The architecture uses Standard Locally Redundant Storage (SKU Standard_LRS) for redundancy in the primary region.
For security, the Storage Account does not use access keys and relies on Managed Identity for access.

The architecture uses Azure Storage Queue for queuing. The Web Service sends data to the queue, and the Kernel Memory Async Handlers retrieve data from it.
{: .highlight }
It's important to note that orchestration can also be achieved using RabbitMQ.

As a Vector DB, this architecture uses Azure AI Search. Kernel Memory can also be configured to work with Qdrant, Postgres, Redis, SimpleVectorDb, or SQL Server.
Access to Azure AI Search, like other services, is protected with Managed Identity.

For AI models, this architecture leverages models deployed in Azure AI.
The text-embedding-ada-002 version `2` is used for embedding, and the `gpt-35-turbo-16k` deployment version `0613` is used for inference.
Model names and versions are specified in the `infra/main.bicep` file.

All resources are deployed in the same Azure Resource Group and the region.

![image](./architecture.png)

## Cost

When you deploy Kernel Memory in Azure, you will incur costs associated with the resources it uses.

{: .highlight }
It's important to understand the costs of your Kernel Memory deployment in Azure, as Azure resource usage is billed based on the resources consumed.

### Microsoft Azure Estimate

The following approximate estimate was generated using the Azure Pricing Calculator. You can review the estimate for the proposed architecture on the [Azure Pricing Calculator](https://azure.com/e/16013d6ddab34b49beee72119c8f71a9).

{: .highlight }
To minimize costs while testing Kernel Memory on Azure, delete resources after use. Please note that all uploaded data will be lost upon deletion.

| Service category | Service type | Region | Description | Estimated monthly cost |
| --------------------- | ------------------------------ | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------- |
| AI + machine learning | Azure OpenAI Service | East US | Embedding Models, Text-Embedding-3-Small, 2,000 x 1,000 Tokens | $0.04 |
| AI + machine learning | Azure OpenAI Service | East US | Language Models, GPT-3.5-Turbo-0125-16K, 5,000 x 1,000 input tokens, 2,000 x 1,000 output tokens | $5.50 |
| AI + machine learning | Azure AI Document Intelligence | West US | Azure Form Recognizer, Pay as you go, S0: 1 x 1,000 Custom pages, 0 x 1,000 Pre-built pages, 1 x 1,000 Read pages, 0 x 1,000 Add-on pages, 0 x 1,000 Query pages | $52.50 |
| Web | Azure AI Search | East US | Basic, 1 Unit(s), 1 Month | $73.73 |
| Containers | Azure Container Apps | East US | Consumption Plan Type, 10 million requests per month, Pay as you go, 20 concurrent requests per container app, 100 milliseconds execution time per request, 2 vCPUs, 1 GiB memory, Pay as you go | $3.20 |
| Storage | Storage Accounts | East US | Block Blob Storage, General Purpose V2, Flat Namespace, LRS Redundancy, Hot Access Tier, 10 GB Capacity - Pay as you go, 10 x 10,000 Write operations, 10 x 10,000 List and Create Container Operations, 10 x 10,000 Read operations, 1 x 10,000 Other operations. 1,000 GB Data Retrieval, 1,000 GB Data Write, SFTP disabled | $1.25 |
| Storage | Storage Accounts | East US | Queue Storage, General Purpose V2, LRS Redundancy, 1 GB Capacity, 1,000 Queue Class 1 operations, 1,000 Queue Class 2 operations | $8.05 |
| Networking | Virtual Network | East US | East US (Virtual Network 1): 100 GB Outbound Data Transfer; East US (Virtual Network 2): 100 GB Outbound Data Transfer | $4.00 |
| Networking | Azure Private Link | East US | 5 Private Links X 1 Endpoint X 730 Hours, 100 GB Outbound data processed, 100 GB Inbound data processed | $46.50 |
| Networking | Azure DNS | East US | 5 Private DNS records X Zone 1, DNS, Public; 0 hosted DNS zones, 0 DNS queries | $0.0 |
| Networking | Application Gateway | East US | Basic tier, Small Instance size: 1 Gateway hours instance(s) x 730 Hours, 0 GB Data processed unit(s), 5 GB Zone unit(s) | $18.25 |
| Networking | IP Addresses | East US | Basic (Classic), 0 Dynamic IP Addresses X 730 Hours, 1 Static IP Addresses X 1 Month | $2.63 |
| Support | | Support | | $0.00 |
| | | Licensing Program | Microsoft Customer Agreement (MCA) | |
| | | Total | | $195.65 |

_All prices shown are in United States – Dollar ($) USD. This is a summary estimate, not a quote. For up to date pricing information please visit https://azure.microsoft.com/pricing/calculator/
This estimate was created on 6/24/2024 2:41:50 PM UTC._

### Next steps

Follow the [deployment](./deployment) to deploy Kernel Memory in Azure as well as the [usage](./usage) guide to understand how to use Kernel Memory in Azure.
Binary file added docs/azure/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/azure/async.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading