Skip to content

Commit

Permalink
docs: Restructure image folder and add alt for SEO (#5150)
Browse files Browse the repository at this point in the history
Restructure image folder and add alt for seo

Signed-off-by: Sherlock113 <[email protected]>
  • Loading branch information
Sherlock113 authored Dec 25, 2024
1 parent 7f0862e commit 749f49e
Show file tree
Hide file tree
Showing 32 changed files with 36 additions and 18 deletions.
Binary file not shown.
Binary file not shown.
9 changes: 6 additions & 3 deletions docs/source/build-with-bentoml/asgi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,17 +131,20 @@ The following is a more practical example of mounting FastAPI onto the Summariza
After you start the BentoML Service, which is accessible at `http://localhost:3000 <http://localhost:3000/>`_, you can find two additional endpoints ``hello-inside`` and ``hello-outside`` exposed.

.. image:: ../../_static/img/guides/asgi/two-asgi-fastapi-routes.png
.. image:: ../../_static/img/build-with-bentoml/asgi/two-asgi-fastapi-routes.png
:alt: Two API endpoints defined in BentoML

By sending a ``GET`` request, you can receive the corresponding output from both endpoints.

FastAPI route inside the Service class:

.. image:: ../../_static/img/guides/asgi/inside-the-class.png
.. image:: ../../_static/img/build-with-bentoml/asgi/inside-the-class.png
:alt: FastAPI route inside the BentoML Service class

FastAPI route outside the Service class:

.. image:: ../../_static/img/guides/asgi/outside-the-class.png
.. image:: ../../_static/img/build-with-bentoml/asgi/outside-the-class.png
:alt: FastAPI route outside the BentoML Service class

Quart
^^^^^
Expand Down
3 changes: 2 additions & 1 deletion docs/source/build-with-bentoml/gpu-inference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,10 @@ To use a specific GPU:
This image explains how different models use the GPUs assigned to them.

.. image:: ../../_static/img/guides/gpu-inference/gpu-inference-architecture.png
.. image:: ../../_static/img/build-with-bentoml/gpu-inference/gpu-inference-architecture.png
:width: 400px
:align: center
:alt: BentoML GPU inference architecture

.. note::

Expand Down
3 changes: 2 additions & 1 deletion docs/source/build-with-bentoml/gradio.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ Follow the steps below to integrate Gradio with a BentoML Service.
bentoml serve service:Summarization
.. image:: ../../_static/img/guides/gradio/gradio-ui-bentoml.png
.. image:: ../../_static/img/build-with-bentoml/gradio/gradio-ui-bentoml.png
:alt: Gradio UI for a BentoML Service

Visit this `example <https://github.com/bentoml/BentoML/tree/main/examples>`_ to view the full demo code.
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,8 @@ BentoML provides an efficient mechanism for loading AI models to accelerate mode

If you deploy the HF model to BentoCloud, you can view and verify it within your Bento on the details page. It is indicated with the HF icon. Clicking it redirects you to the model page on HF.

.. image:: ../../_static/img/guides/model-loading-and-management/hf-model-on-bentocloud.png
.. image:: ../../_static/img/build-with-bentoml/model-loading-and-management/hf-model-on-bentocloud.png
:alt: Hugging Face model marked with an icon on BentoCloud console

.. tab-item:: From the Model Store or BentoCloud

Expand Down
9 changes: 6 additions & 3 deletions docs/source/build-with-bentoml/observability/metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -394,7 +394,8 @@ You can integrate Prometheus to scrape and visualize both default and custom met
histogram_quantile(0.99, rate(bentoml_service_request_duration_seconds_bucket{endpoint="/encode"}[1m]))
.. image:: ../../_static/img/guides/observability/metrics/prome-ui-bentoml.png
.. image:: ../../_static/img/build-with-bentoml/observability/metrics/prome-ui-bentoml.png
:alt: Prometheus UI for BentoML metrics

Create a Grafana dashboard
--------------------------
Expand Down Expand Up @@ -424,10 +425,12 @@ Grafana is an analytics platform that allows you to create dynamic and informati
4. Access the Grafana web UI at ``http://localhost:4000/`` (use your own port). Log in with the default credentials (``admin``/``admin``).
5. In the Grafana search box at the top, enter ``Data sources`` and add Prometheus as an available option. In **Connection**, set the URL to the address of your running Prometheus instance, such as ``http://localhost:9090``. Save the configuration and test the connection to ensure Grafana can retrieve data from Prometheus.

.. image:: ../../_static/img/guides/observability/metrics/grafana-bentoml-1.png
.. image:: ../../_static/img/build-with-bentoml/observability/metrics/grafana-bentoml-1.png
:alt: Add Prometheus in Grafana

6. With Prometheus configured as a data source, you can create a new dashboard. Start by adding a panel and selecting a metric to visualize, such as ``bentoml_service_request_duration_seconds_bucket``. Grafana offers a wide array of visualization options, from simple line graphs to more complex representations like heatmaps or gauges.

.. image:: ../../_static/img/guides/observability/metrics/grafana-bentoml-2.png
.. image:: ../../_static/img/build-with-bentoml/observability/metrics/grafana-bentoml-2.png
:alt: Grafana UI for BentoML metrics

For detailed instructions on dashboard creation and customization, read the `Grafana documentation <https://grafana.com/docs/grafana/latest/dashboards/>`_.
6 changes: 4 additions & 2 deletions docs/source/build-with-bentoml/observability/tracing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,8 @@ With your BentoML Service configured, `run Zipkin <https://zipkin.io/pages/quick
Start your BentoML Service and send some requests to it. You can then visit the Zipkin UI at ``http://localhost:9411/`` to view the traces:
.. image:: ../../_static/img/guides/observability/tracing/zipkin-ui-tracing-bentoml.png
.. image:: ../../_static/img/build-with-bentoml/observability/tracing/zipkin-ui-tracing-bentoml.png
:alt: Zipkin UI for BentoML traces
Jaeger
^^^^^^
Expand Down Expand Up @@ -197,7 +198,8 @@ With your BentoML Service configured, run Jaeger before starting the Service. Fo
Start your BentoML Service and send some requests to it. You can then visit the Jaeger UI at ``http://localhost:16686/`` to view the traces:
.. image:: ../../_static/img/guides/observability/tracing/jaeger-ui-tracing-bentoml.png
.. image:: ../../_static/img/build-with-bentoml/observability/tracing/jaeger-ui-tracing-bentoml.png
:alt: Jaeger UI for BentoML traces
OTLP exporter
^^^^^^^^^^^^^
Expand Down
3 changes: 2 additions & 1 deletion docs/source/build-with-bentoml/parallelize-requests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,10 @@ Here is an example:
This Service dynamically determines the GPU device to use for the model by creating a ``torch.device`` object. The device ID is set by ``bentoml.server_context.worker_index - 1`` to allocate a specific GPU to each worker process. Worker 1 (``worker_index = 1``) uses GPU 0 and worker 2 (``worker_index = 2``) uses GPU 1. See the figure below for details.

.. image:: ../../_static/img/guides/workers/workers-models-gpus.png
.. image:: ../../_static/img/build-with-bentoml/workers/workers-models-gpus.png
:width: 400px
:align: center
:alt: GPUs allocated to different BentoML workers for serving models

When determining which device ID to assign to each worker for tasks such as loading models onto GPUs, this 1-indexing approach means you need to subtract 1 from the ``worker_index`` to get the 0-based device ID. This is because hardware devices like GPUs are usually indexed starting from 0. For more information, see :doc:`/build-with-bentoml/gpu-inference`.

Expand Down
6 changes: 4 additions & 2 deletions docs/source/get-started/adaptive-batching.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,17 @@ Adaptive batching is implemented on the server side. This is advantageous as opp

Specifically, there is a dispatcher within a BentoML Service that oversees collecting requests into a batch until the conditions of the batch window or batch size are met, at which point the batch is sent to the model for inference.

.. image:: ../../_static/img/guides/adaptive-batching/single-service-batching.png
.. image:: ../../_static/img/get-started/adaptive-batching/single-service-batching.png
:width: 65%
:align: center
:alt: Adaptive batching in a single BentoML Service

For multiple Services, the Service responsible for running model inference (``ServiceTwo`` in the diagram below) collects requests from the intermediary Service (``ServiceOne``) and forms batches based on optimal latency.

.. image:: ../../_static/img/guides/adaptive-batching/multi-service-batching.png
.. image:: ../../_static/img/get-started/adaptive-batching/multi-service-batching.png
:width: 100%
:align: center
:alt: Adaptive batching in multiple BentoML Services

.. note::

Expand Down
3 changes: 2 additions & 1 deletion docs/source/get-started/async-task-queues.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,10 @@ Waiting synchronously for such tasks could lead to inefficiencies, with the call

Here is the general workflow of using BentoML tasks:

.. image:: ../../_static/img/guides/tasks/task-workflow.png
.. image:: ../../_static/img/get-started/tasks/task-workflow.png
:width: 400px
:align: center
:alt: BentoML task workflow

Define a task endpoint
----------------------
Expand Down
3 changes: 2 additions & 1 deletion docs/source/get-started/hello-world.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,8 @@ Serve the model locally

Visit `http://localhost:3000 <http://localhost:3000/>`_, scroll down to **Service APIs**, and click **Try it out**. In the **Request body** box, enter your prompt and click **Execute**.

.. image:: ../_static/img/get-started/quickstart/service-ui.png
.. image:: ../_static/img/get-started/hello-world/service-ui.png
:alt: BentoML hello world example Swagger UI

Expected output:

Expand Down
3 changes: 2 additions & 1 deletion docs/source/reference/bentoml/sdk.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ Note that when you enable batching, ``batch_dim`` can be a tuple or a single val

This image illustrates the concept of ``batch_dim`` in the context of processing 2-D arrays.

.. image:: ../../_static/img/guides/adaptive-batching/batch-dim-example.png
.. image:: ../../_static/img/reference/bentoml/sdk/batch-dim-example.png
:alt: Batching dimension explanation

On the left side, there are two 2-D arrays of size 5x2, represented by blue and green boxes. The arrows show two different paths that these arrays can take depending on the ``batch_dim`` configuration:

Expand Down
3 changes: 2 additions & 1 deletion docs/source/scale-with-bentocloud/scaling/autoscaling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ For instance, consider a scenario where ``concurrency`` is set to 32 and the Ser

In general, the autoscaler will scale the number of replicas based on the following formula, permitted by the ``min_replicas`` and ``max_replicas`` settings in the deployment:

.. image:: ../../_static/img/guides/autoscaling/hpa.png
.. image:: ../../_static/img/bentocloud/autoscaling/hpa.png
:alt: HPA algorithm

Key points about concurrency:

Expand Down

0 comments on commit 749f49e

Please sign in to comment.