-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rs/logging example #796
base: main
Are you sure you want to change the base?
Rs/logging example #796
Changes from all commits
56d26ac
2f77cd6
6a636ec
73ee974
321061c
fe2998d
b6a09a7
1006f71
a8ba15e
a370444
abf08a7
8703e39
32ac1c0
b7cc94b
c413df3
6acceb5
480896c
10889ec
5079709
cb6f0dc
a3313eb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,230 @@ | ||||||
# DeepSparse + Prometheus/Grafana | ||||||
|
||||||
This is a simple example that shows you how to connect DeepSparse Logging to the Prometheus/Grafana stack. | ||||||
|
||||||
#### There are four steps: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- Configure DeepSparse Logging to log metrics in Prometheus format to a REST endpoint | ||||||
robertgshaw2-neuralmagic marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- Point Prometheus to the appropriate endpoint to scrape the data at a specified interval | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- Run the client script simulating a data quality/drift issue | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- Visualize data in Prometheus with dashboarding tool like Grafana | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
## 0. Setting Up | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
#### Installation | ||||||
|
||||||
To run this tutorial, you need Docker, Docker Compose, and DeepSparse Server | ||||||
- [Docker Installation](https://docs.docker.com/engine/install/) | ||||||
- [Docker Compose Installation](https://docs.docker.com/compose/install/) | ||||||
- DeepSparse Server is installed via PyPi (`pip install deepsparse[server]`) | ||||||
|
||||||
#### Code | ||||||
The repository contains all the code you need: | ||||||
|
||||||
```bash | ||||||
. | ||||||
├── client | ||||||
│ ├── client.py # simple client application for interacting with Server | ||||||
│ └── goldfish.jpeg # photo of a goldfish | ||||||
| └── all_black.jpeg # photo with just black pixels | ||||||
├── server-config.yaml # specifies the configuration of the DeepSparse server | ||||||
├── custom-fn.py # custom function used for the logging | ||||||
├── docker # specifies the configuration of the containerized Prometheus/Grafana stack | ||||||
│ ├── docker-compose.yaml | ||||||
│ └── prometheus.yaml | ||||||
└── grafana # specifies the design of the Grafana dashboard | ||||||
└── dashboard.json | ||||||
``` | ||||||
## 1. Spin up the DeepSparse Server | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
`server-config.yaml` specifies the config of the DeepSparse Server, including for logging: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```yaml | ||||||
# server-config.yaml | ||||||
|
||||||
loggers: | ||||||
prometheus: # logs to prometheus on port 6100 | ||||||
port: 6100 | ||||||
|
||||||
endpoints: | ||||||
- task: image_classification | ||||||
route: /image_classification/predict | ||||||
model: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none | ||||||
data_logging: | ||||||
pipeline_inputs.images[0]: # applies to the first image (of the form target.property[idx]) | ||||||
- func: fraction_zeros # built-in function | ||||||
frequency: 1 | ||||||
target_loggers: | ||||||
- prometheus | ||||||
- func: custom-fn.py:mean_pixel_red # custom function | ||||||
frequency: 1 | ||||||
target_loggers: | ||||||
- prometheus | ||||||
``` | ||||||
|
||||||
The config file instructs the server to create an image classification pipeline. Prometheus logs are declared to be exposed on port `6100`, system logging is turned on, and we will log the mean pixel of the red channel (a custom function) as well as the percentage of pixels that are 0 (a built-in function) for each image sent to the server. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Thus, once launched, the Server exposes two endpoints: | ||||||
- port `6100`: exposes the `metrics` endpoint through [Prometheus python client](https://github.com/prometheus/client_python). | ||||||
- port `5543`: exposes the endpoint for inference. | ||||||
|
||||||
To spin up the Server execute: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
``` | ||||||
deepsparse.server --config_file server-config.yaml | ||||||
``` | ||||||
|
||||||
To validate that metrics are being properly exposed, visit `localhost:6100`. It should contain logs in the specific format meant to be used by the PromQL query language. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
## 2. Setup Prometheus/Grafana Stack | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
For simplicity, we have provided `docker-compose.yaml` that spins up the containerized Prometheus/Grafana stack. In that file, we instruct `prometheus.yaml` (a [Prometheus config file](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)) to be passed to the Prometheus container. Inside `prometheus.yaml`, the `scrape_config` has the information about the `metrics` endpoint exposed by the server on port `6100`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Docker Compose File: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```yaml | ||||||
# docker-compose.yaml | ||||||
|
||||||
version: "3" | ||||||
|
||||||
services: | ||||||
prometheus: | ||||||
image: prom/prometheus | ||||||
extra_hosts: | ||||||
- "host.docker.internal:host-gateway" # allow a direct connection from container to the local machine | ||||||
ports: | ||||||
- "9090:9090" # the default port used by Prometheus | ||||||
volumes: | ||||||
- ${PWD}/prometheus.yaml:/etc/prometheus/prometheus.yml # mount Prometheus config file | ||||||
|
||||||
grafana: | ||||||
image: grafana/grafana:latest | ||||||
depends_on: | ||||||
- prometheus | ||||||
ports: | ||||||
- "3000:3000" # the default port used by Grafana | ||||||
|
||||||
``` | ||||||
|
||||||
Prometheus Config file: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```yaml | ||||||
# prometheus.yaml | ||||||
|
||||||
global: | ||||||
scrape_interval: 15s # how often to scrape from endpoint | ||||||
evaluation_interval: 30s # time between each evaluation of Prometheus' alerting rules | ||||||
|
||||||
scrape_configs: | ||||||
- job_name: prometheus_logs # your project name | ||||||
static_configs: | ||||||
- targets: | ||||||
- 'host.docker.internal:6100' # should match the port exposed by the PrometheusLogger in the DeepSparse Server config file | ||||||
``` | ||||||
</details> | ||||||
|
||||||
To start up a Prometheus stack to monitor the DeepSparse Server, run: | ||||||
|
||||||
```bash | ||||||
cd docker | ||||||
docker-compose up | ||||||
``` | ||||||
|
||||||
## 3. Launch the Python Client and Run Inference | ||||||
robertgshaw2-neuralmagic marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
`client.py` is a simple client that simulates the behavior of some application. In the example, we have two images: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- `goldfish.jpeg`: sample photo of a Goldfish | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- `all-black.jpeg`: a photo that is all black (every pixel is a 0) | ||||||
|
||||||
The client sends requests to the Server, initially with "just" the Goldfish. Over time, we start to randomly | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
send the All Black image to the server with increasing probability. This simulates a data issue in the | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
pipeline that we can detect with the monitoring system. | ||||||
|
||||||
Run the following to start inference: | ||||||
|
||||||
```bash | ||||||
python client/client.py | ||||||
``` | ||||||
|
||||||
It prints out which image was sent to the server. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
## 4. Inspecting the Prometheus/Grafana Stack | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
### Prometheus | ||||||
|
||||||
#### Confirm It Is Working | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Visiting `http://localhost:9090/targets` should show that an endpoint `http://host.docker.internal:6100/metrics` is in state `UP`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
#### Query Prometheus with PromQL | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
If you do not want to use Grafana, you can start off by using Prometheus's native graphing functionality. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. COMBINE THESE TWO PARAGRAPHS |
||||||
Navigate to `http://localhost:9090/graph` and add the following `Expression`: | ||||||
|
||||||
``` | ||||||
rate(image_classification__0__pipeline_inputs__images__fraction_zeros_sum[30s]) | ||||||
/ | ||||||
rate(image_classification__0__pipeline_inputs__images__fraction_zeros_count[30s]) | ||||||
``` | ||||||
|
||||||
You should see the following: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
![prometheus-dashboard.png](image/prometheus-dashboard.png) | ||||||
|
||||||
This graph shows the percentage of 0 pixels in the images sent to the server. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
As the "corrupted" all black images were sent to the server in increasing probability, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
we can clearly see a spike in the graph, alerting us | ||||||
that something strange is happening with the provided input. | ||||||
|
||||||
DeepSparse Server also automatically logs prediction latencies for each pipeline stage as well | ||||||
end-to-end server-side inference time. Add the following query to inspect average latency: | ||||||
|
||||||
``` | ||||||
rate(image_classification__0__prediction_latency__total_inference_sum[30s]) | ||||||
/ | ||||||
rate(image_classification__0__prediction_latency__total_inference_count[30s]) | ||||||
``` | ||||||
|
||||||
![prometheus-dashboard-latency.png](image/prometheus-dashboard-latency.png) | ||||||
|
||||||
For more details on working with the Prometheus Query Language PromQL, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
see [the official docs](https://prometheus.io/docs/prometheus/latest/querying/basics/). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
### Grafana | ||||||
|
||||||
#### Login | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Visit `localhost:3000` to launch Grafana. Log in with the default username (`admin`) and password (`admin`). | ||||||
|
||||||
#### Add Prometheus Data Source | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
Setup the Prometheus data source (`Add your first data source` -> `Prometheus`). On this page, we just | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
need to update the `url` section. Since Grafana and Prometheus are running separate docker containers, | ||||||
put we need to put the IP address of the Prometheus container. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Run the following to lookup the `name` of your Prometheus container: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
``` | ||||||
docker container ls | ||||||
>>> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | ||||||
>>> 997521854d84 grafana/grafana:latest "/run.sh" About an hour ago Up About an hour 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp docker_grafana_1 | ||||||
>>> c611c80ae05e prom/prometheus "/bin/prometheus --c…" About an hour ago Up About an hour 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp docker_prometheus_1 | ||||||
``` | ||||||
|
||||||
Run the following to lookup the IP address (replace `docker_prometheus_1` with your container's name): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
``` | ||||||
docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' docker_prometheus_1 | ||||||
>>> 172.18.0.2 | ||||||
``` | ||||||
|
||||||
So, in our case, the `url` section should be: `http://172.18.0.2:9090`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Click `Save & Test`. We should get a green check saying "Data Source Is Working". | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
#### Import A Dashboard | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Now you should be ready to create/import your dashboard. | ||||||
|
||||||
Grafana's interface for adding metrics is very intuitive (and you can use PromQL), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
but we have provided a simple pre-made dashboard for this use case. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Click `Dashboard` -> `Import` on the left-hand side bar. You should see an option to upload a file. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
Upload `grafana/dashboard.json` and save. Then, you should see the following dashboard: | ||||||
|
||||||
![img.png](image/grafana-dashboard.png) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
import random, time, requests, argparse | ||
|
||
parser = argparse.ArgumentParser() | ||
parser.add_argument("--url", type=str, default="http://0.0.0.0:5543/image_classification/predict/from_files") | ||
parser.add_argument("--img1_path", type=str, default="client/goldfish.jpeg") | ||
parser.add_argument("--img2_path", type=str, default="client/all_black.jpeg") | ||
parser.add_argument("--num_iters", type=int, default=25) | ||
parser.add_argument("--prob_incr", type=float, default=0.1) | ||
|
||
def send_random_img(url, img1_path, img2_path, prob_img2): | ||
img_path = "" | ||
if random.uniform(0, 1) < prob_img2: | ||
img_path = img2_path | ||
else: | ||
img_path = img1_path | ||
|
||
files = [('request', open(img_path, 'rb'))] | ||
resp = requests.post(url=url, files=files) | ||
print(f"Sent File: {img_path}") | ||
|
||
def main(url, img1_path, img2_path, num_iters, prob_incr): | ||
prob_img2 = 0.0 | ||
iters = 0 | ||
increasing = True | ||
|
||
while (increasing or prob_img2 > 0.0): | ||
send_random_img(url, img1_path, img2_path, prob_img2) | ||
|
||
if iters % num_iters == 0 and increasing: | ||
prob_img2 += prob_incr | ||
elif iters % num_iters == 0: | ||
prob_img2 -= prob_incr | ||
iters += 1 | ||
|
||
if prob_img2 >= 1.0: | ||
increasing = False | ||
prob_img2 -= prob_incr | ||
|
||
time.sleep(0.25) | ||
|
||
if __name__ == "__main__": | ||
args = vars(parser.parse_args()) | ||
main(args["url"], args["img1_path"], args["img2_path"], args["num_iters"], args["prob_incr"]) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
import numpy as np | ||
from typing import List | ||
|
||
def mean_pixel_red(img: np.ndarray): | ||
robertgshaw2-neuralmagic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return np.mean(img[:,:,0]) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# docker-compose.yaml | ||
|
||
version: "3" | ||
|
||
services: | ||
prometheus: | ||
image: prom/prometheus | ||
extra_hosts: | ||
- "host.docker.internal:host-gateway" # allow a direct connection from container to the local machine | ||
ports: | ||
- "9090:9090" # the default port used by Prometheus | ||
volumes: | ||
- ${PWD}/prometheus.yaml:/etc/prometheus/prometheus.yml # mount Prometheus config file | ||
|
||
grafana: | ||
image: grafana/grafana:latest | ||
depends_on: | ||
- prometheus | ||
ports: | ||
- "3000:3000" # the default port used by Grafana |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# prometheus.yaml | ||
|
||
global: | ||
scrape_interval: 15s # how often to scrape from endpoint | ||
evaluation_interval: 30s # time between each evaluation of Prometheus' alerting rules | ||
|
||
scrape_configs: | ||
- job_name: deepsparse_img_classification # your project name | ||
static_configs: | ||
- targets: | ||
- 'host.docker.internal:6100' # should match the port exposed by the PrometheusLogger in the DeepSparse Server config file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.