This document is for developers interested in contributing to GraphRAG.
Development is best done in a unix environment (Linux, Mac, or Windows WSL).
-
Clone the GraphRAG repository.
-
Follow all directions in the deployment guide to install required tools and deploy an instance of the GraphRAG service in Azure. Alternatively, this repo provides a devcontainer with all tools preinstalled.
-
Create a
.env
file in the root of the repository (GraphRAG/.env
). A detailed description of environment variables used to configure graphrag can be found here. Add the following environment variables to the.env
file:Environment Variable Description COSMOS_URI_ENDPOINT
Azure CosmosDB connection string from graphrag deployment STORAGE_ACCOUNT_BLOB_URL
Azure Storage blob url from graphrag deployment AI_SEARCH_URL
AI search endpoint from graphrag deployment (will be in the form of https://<name>.search.windows.net) GRAPHRAG_API_BASE
The AOAI API Base URL. GRAPHRAG_API_VERSION
The AOAI API version (i.e. 2023-03-15-preview
)GRAPHRAG_LLM_MODEL
The AOAI model name (i.e. gpt-4
)GRAPHRAG_LLM_DEPLOYMENT_NAME
The AOAI model deployment name (i.e. gpt-4-turbo
)GRAPHRAG_EMBEDDING_MODEL
The AOAI model name (i.e. text-embedding-ada-002
)GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME
The AOAI model deployment name (i.e. my-text-embedding-ada-002
)REPORTERS
A comma-delimited list of logging that will be enabled. Possible values are blob,console,file
-
Developing inside the devcontainer
-
Requirements
- Docker
- Visual Studio Code
- Remote - Containers extension for VS Code
-
Open VS Code in the directory containing your project.
- Use the Command Palette (Ctrl+Shift+P on Windows/Linux, Cmd+Shift+P on macOS) and type "Remote-Containers: Open Folder in Container..."
- Select your project folder and VS Code will start building the Docker container based on the Dockerfile and devcontainer.json in your project. This process may take a few minutes, especially on the first run.
- Once your vscode prompt appears, it may not be done. You should wait for the following prompt to appear to ensure full install is complete.
vscode@<hostname>:/graphrag$
-
Adding Python packages to the dev container.
- Poetry is the Python package manager in the dev container. Python packages can be added using
poetry add <package-name>
- Everytime a package is added it will update
poetry.lock
andpyproject.toml
, these are the two files that track all package management. Changes to these file should be checked into the repo. That is how we keep our devcontainer consistent across users. - Its possible to get into a situation where a package has been added but your local poetry.lock does not contain the proper hash. This is most common after resolving a merge conflict and the easiest way to resolve this issue is
poetry install
, which will check all package status' and update hash values inpoetry.lock
.
- Poetry is the Python package manager in the dev container. Python packages can be added using
-
Adding dependencies to the environment
- Most dependencies (packages or tools) should be added to the environment through the Dockerfile. This allows us to maintain a consistent development enviornment. If you need a tool added, please make the appropriate changes to the Dockerfile and submit a Pull Request.
-
The GraphRAG service consist of two components - a backend
application and a frontend
UI application (coming soon). GraphRAG can be launched in multiple ways depending on where in the application stack you are developing and debugging.
-
In Azure Kubernetes Service (AKS):
Navigate to the root directory of the repository. First build and publish the
backend
docker image to an azure container registry.> az acr build --registry <my_container_registry> -f docker/Dockerfile-backend --image graphrag:backend .
Update
infra/deployment.parameters.json
to use your custom graphrag docker images and re-run the deployment script to update AKS.After deployment is complete,
kubectl
is used to login and view the GraphRAG AKS resources as well aid in other debugging use-cases. See below for some helpful commands to quickly access AKS> RGNAME=<your_resource_group> > AKSNAME=`az aks list --resource-group $RGNAME --query "[].name" --output tsv` > az aks get-credentials -g $RGNAME -n $AKSNAME --overwrite-existing > kubectl config set-context --current --namespace=graphrag
Some example AKS commands below to get started
> kubectl get pods # view a list of all deployed pods > kubectl get nodes # view a list of all deployed nodes > kubectl get jobs # view a list of all AKS jobs > kubectl logs <pod_name> # print out useful logging information (print statements) > kubectl exec -it <pod_name> -- bash # login to a running container > kubectl describe pod <pod_name> # retrieve detailed info about a pod > kubectl describe node <node_name> # retrieve detailed info about a node
A small collection of pytests have been written to test functionality of the API. To run the tests, add the following envirionment variables to a .env
file in the root of the repo directory.
APIM_SUBSCRIPTION_KEY
COSMOS_URI_ENDPOINT
DEPLOYMENT_URL
STORAGE_ACCOUNT_BLOB_URL
The tests assume the solution accelerator has been previously deployed and managed identity has been setup with RBAC access to CosmosDB and Azure Storage. To run the test locally:
# cd to root directory of the repo
> pytest backend/src/tests/test_all_index_endpoint.py -s
This repository uses Github Actions for continuous integration and continious deployment (CI/CD).
-
We follow PEP 8 standards and naming conventions as close as possible.
-
ruff is used for linting and code formatting. A pre-commit hook has been setup to automatically apply settings to this repo. To make use of this tool without explicitly calling it, install the pre-commit hook.
> pre-commit install
We use SemVer for semantic versioning.