You can install Presidio locally using Docker or KIND, as a service in Kubernetes or AKS or use it as a framework in a python application.
Decide on a name for your Presidio project. In the examples the project name is <my-project>
.
- Development and Testing
- Production - Deploy with Kubernetes
You will need to have Docker installed and running, and make installed on your system.
Sync this repo use make
to build and deploy locally.
For convenience the script build.sh at the root of this repo will run the make
commands for you. If you use the script remember to make it executable by running chmod +x build.sh
after syncing the code.
NOTE: Building the deps images currently takes some time (~70 minutes, depending on the build machine). We are working on improving the build time through improving the build and providing pre-built dependencies.
NOTE: Error message You may see error messages like this:
Error response from daemon: pull access denied for presidio/presidio-golang-deps, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
when running the make
commands. These can be ignored.
NOTE: Memory requirements if you get an error message like this
tests/test_analyzer_engine.py ...............The command '/bin/sh -c pipenv run pytest' returned a non-zero code: 137
while building you may need to increase the docker memory limit for your machine
Once the build is complete you can verify the local deployment by running:
docker ps
You should see 4 Presidio containers and one Redis container running with the following names:
presidio-api
presidio-recognizers-store
presidio-anonymizer
presidio-analyzer
redis
Presidio is built for Kubernetes, you can give it a try using KIND (Kubernetes IN Docker).
-
Install Docker.
-
Optional (Linux) - the following command will install all prerequisites (Docker, Helm, make, kubetl).
cd deployment/ ./prerequisites.sh
depending on your environment, sudo might be needed
-
-
Clone Presidio.
-
Run the following script, which will use KIND (Kubernetes emulation in Docker)
cd deployment/ ./run-with-kind.sh
-
Wait and verify all pods are running:
kubectl get pod -n presidio
-
Port forwarding of HTTP requests to the API micro-service will be done automatically. In order to run manual:
kubectl port-forward <presidio-api-pod-name> 8080:8080 -n presidio
If you're interested in running the analyzer alone, you can install it as a standalone python package by packaging it into a wheel
file. Note that Presidio requires Python >= 3.6.
In the presidio-analyzer folder, run:
python setup.py bdist_wheel
-
Copy the created wheel file (from the
dist
folder of presidio-analyzer) into a clean virtual environment -
install
wheel
package
pip install wheel
- Install the presidio-analyzer wheel file
pip install WHEEL_FILE
Where WHEEL_FILE
is the path to the created wheel file
- Install the Spacy model from Github (not installed during the standard installation)
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.1.0/en_core_web_lg-2.1.0.tar.gz
Note that if you skip this step, the Spacy model would install lazily during the first call to the AnalyzerEngine
- Optional : install
re2
andpyre2
:
-
Install re2:
re2_version="2018-12-01" wget -O re2.tar.gz https://github.com/google/re2/archive/${re2_version}.tar.gz mkdir re2 tar --extract --file "re2.tar.gz" --directory "re2" --strip-components 1 cd re2 && make install
-
Install
pyre2
's fork:pip install https://github.com/torosent/pyre2/archive/release/0.2.23.zip
Note: If you don't install
re2
, Presidio will use theregex
package for regular expressions handling
- Test the installation
To test, run Python on the virtual env you've installed the presidio-analyzer in. Then, make sure this code returns an answer:
from presidio_analyzer import AnalyzerEngine
engine = AnalyzerEngine()
text = "My name is David and I live in Miami"
response = engine.analyze(correlation_id=0,
text = text,
entities=[],
language='en',
all_fields=True,
score_threshold=0.5)
for item in response:
print("Start = {}, end = {}, entity = {}, confidence = {}".format(item.start,
item.end,
item.entity_type,
item.score))
-
A Kubernetes 1.9+ cluster with RBAC enabled. If you are using AKS RBAC is enabled by default.
- Note the pod's resources requirements (CPU and memory) and plan the cluster accordingly.
-
kubectl installed
-
verify you can communicate with the cluster by running:
kubectl version
-
-
Local helm client.
-
Optional - Container Registry - such as ACR. Only needed if you are using your own presidio images and not the default ones from from Microsoft syndicates container catalog
-
Recent presidio repo is cloned on your local machine.
-
Navigate into
<root>\deployment
from command line. -
If You have helm installed, but havn't run
helm init
, execute deploy-helm.sh in the command line. It will installtiller
(helm server side) on your cluster, and grant it sufficient permissions. Note that this script uses Helm 2 version.
deploy-helm.sh
-
Optional - Grant the Kubernetes cluster access to the container registry. Only needed if you will use your own presidio images. This step can be skipped and the script will pull the container images from Microsoft syndicates container catalog
- If using Azure Kubernetes Service, follow these instructions to grant the AKS cluster access to the ACR.
-
If you already have
helm
andtiller
configured, or if you installed it in the previous step, execute deploy-presidio.sh in the command line as follows:
deploy-presidio.sh
The script will install Presidio on your cluster using the default values.
Note: You can edit the file to use your own container registry and image.
-
Install Redis (Cache for storage and database scanners)
helm install --name redis stable/redis --set usePassword=false,rbac.create=true --namespace presidio-system
-
Optional - Ingress controller for presidio API.
Note that presidio is not deployed with an ingress controller by default.
to change this behavior, deploy the helm chart with api.ingress.enabled=true and specify they type of ingress controller to be used with api.ingress.class=nginx (supported classes are: nginx, traefik or istio). -
Verify that Redis and Traefik/NGINX are installed correctly
-
Deploy from
/charts/presidio
# Based on the DOCKER_REGISTRY and PRESIDIO_LABEL from the previous steps helm install --name presidio-demo --set registry=${DOCKER_REGISTRY},tag=${PRESIDIO_LABEL} . --namespace presidio
-
For more deployment options, follow the Development guide