Skip to content

Latest commit

 

History

History
247 lines (159 loc) · 9.43 KB

deploy.md

File metadata and controls

247 lines (159 loc) · 9.43 KB

Deploy presidio as a system

You can install Presidio locally using Docker or KIND, as a service in Kubernetes or AKS or use it as a framework in a python application.

Decide on a name for your Presidio project. In the examples the project name is <my-project>.

The easy way with Docker

You will need to have Docker installed and running, and make installed on your system.

Sync this repo use make to build and deploy locally.

For convenience the script build.sh at the root of this repo will run the make commands for you. If you use the script remember to make it executable by running chmod +x build.sh after syncing the code.

NOTE: Building the deps images currently takes some time (~70 minutes, depending on the build machine). We are working on improving the build time through improving the build and providing pre-built dependencies.

NOTE: Error message You may see error messages like this:

Error response from daemon: pull access denied for presidio/presidio-golang-deps, repository does not exist or may require 'docker login': denied: requested access to the resource is denied when running the make commands. These can be ignored.

NOTE: Memory requirements if you get an error message like this

tests/test_analyzer_engine.py ...............The command '/bin/sh -c pipenv run pytest' returned a non-zero code: 137

while building you may need to increase the docker memory limit for your machine

Validation

Once the build is complete you can verify the local deployment by running:

docker ps

You should see 4 Presidio containers and one Redis container running with the following names:

presidio-api
presidio-recognizers-store
presidio-anonymizer
presidio-analyzer
redis

Deploy locally with KIND

Presidio is built for Kubernetes, you can give it a try using KIND (Kubernetes IN Docker).

  1. Install Docker.

    • Optional (Linux) - the following command will install all prerequisites (Docker, Helm, make, kubetl).

      cd deployment/
      ./prerequisites.sh

      depending on your environment, sudo might be needed

  2. Clone Presidio.

  3. Run the following script, which will use KIND (Kubernetes emulation in Docker)

    cd deployment/
    ./run-with-kind.sh
  4. Wait and verify all pods are running:

    kubectl get pod -n presidio
  5. Port forwarding of HTTP requests to the API micro-service will be done automatically. In order to run manual:

    kubectl port-forward <presidio-api-pod-name> 8080:8080 -n presidio

Install presidio-analyzer as a Python package

If you're interested in running the analyzer alone, you can install it as a standalone python package by packaging it into a wheel file. Note that Presidio requires Python >= 3.6.

Creating the wheel file:

In the presidio-analyzer folder, run:

python setup.py bdist_wheel

Installing the wheel file

  1. Copy the created wheel file (from the dist folder of presidio-analyzer) into a clean virtual environment

  2. install wheel package

pip install wheel
  1. Install the presidio-analyzer wheel file
pip install WHEEL_FILE

Where WHEEL_FILE is the path to the created wheel file

  1. Install the Spacy model from Github (not installed during the standard installation)
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.1.0/en_core_web_lg-2.1.0.tar.gz

Note that if you skip this step, the Spacy model would install lazily during the first call to the AnalyzerEngine

  1. Optional : install re2 and pyre2:
  • Install re2:

    re2_version="2018-12-01"
    wget -O re2.tar.gz https://github.com/google/re2/archive/${re2_version}.tar.gz
    mkdir re2
    tar --extract --file "re2.tar.gz" --directory "re2" --strip-components 1
    cd re2 && make install
  • Install pyre2's fork:

    pip install https://github.com/torosent/pyre2/archive/release/0.2.23.zip
    

    Note: If you don't install re2, Presidio will use the regex package for regular expressions handling

  1. Test the installation

To test, run Python on the virtual env you've installed the presidio-analyzer in. Then, make sure this code returns an answer:

from presidio_analyzer import AnalyzerEngine

engine = AnalyzerEngine()

text = "My name is David and I live in Miami"

response = engine.analyze(correlation_id=0,
                          text = text,
                          entities=[],
                          language='en',
                          all_fields=True,
                          score_threshold=0.5)

for item in response:
    print("Start = {}, end = {}, entity = {}, confidence = {}".format(item.start,
                                                                      item.end,
                                                                      item.entity_type,
                                                                      item.score))

Presidio As a Service with Kubernetes

Prerequisites

  1. A Kubernetes 1.9+ cluster with RBAC enabled. If you are using AKS RBAC is enabled by default.

    • Note the pod's resources requirements (CPU and memory) and plan the cluster accordingly.
  2. kubectl installed

    • verify you can communicate with the cluster by running:

      kubectl version
  3. Local helm client.

  4. Optional - Container Registry - such as ACR. Only needed if you are using your own presidio images and not the default ones from from Microsoft syndicates container catalog

  5. Recent presidio repo is cloned on your local machine.

Single click deployment

  1. Navigate into <root>\deployment from command line.

  2. If You have helm installed, but havn't run helm init, execute deploy-helm.sh in the command line. It will install tiller (helm server side) on your cluster, and grant it sufficient permissions. Note that this script uses Helm 2 version.

deploy-helm.sh
  1. Optional - Grant the Kubernetes cluster access to the container registry. Only needed if you will use your own presidio images. This step can be skipped and the script will pull the container images from Microsoft syndicates container catalog

  2. If you already have helm and tiller configured, or if you installed it in the previous step, execute deploy-presidio.sh in the command line as follows:

deploy-presidio.sh

The script will install Presidio on your cluster using the default values.

Note: You can edit the file to use your own container registry and image.

Step by step deployment with customizable parameters

  1. Install Helm with RBAC

  2. Install Redis (Cache for storage and database scanners)

    helm install --name redis stable/redis --set usePassword=false,rbac.create=true --namespace presidio-system
  3. Optional - Ingress controller for presidio API.

    Note that presidio is not deployed with an ingress controller by default.
    to change this behavior, deploy the helm chart with api.ingress.enabled=true and specify they type of ingress controller to be used with api.ingress.class=nginx (supported classes are: nginx, traefik or istio).

  4. Verify that Redis and Traefik/NGINX are installed correctly

  5. Deploy from /charts/presidio

    # Based on the DOCKER_REGISTRY and PRESIDIO_LABEL from the previous steps
    helm install --name presidio-demo --set registry=${DOCKER_REGISTRY},tag=${PRESIDIO_LABEL} . --namespace presidio
  6. For more deployment options, follow the Development guide