Skip to content

Commit

Permalink
docs: Fix links (#5122)
Browse files Browse the repository at this point in the history
Fix links

Signed-off-by: Sherlock113 <[email protected]>
  • Loading branch information
Sherlock113 authored Dec 11, 2024
1 parent 09c7e59 commit 430328a
Show file tree
Hide file tree
Showing 9 changed files with 9 additions and 9 deletions.
2 changes: 1 addition & 1 deletion docs/source/build-with-bentoml/distributed-services.rst
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ Deploying a project with distributed Services to BentoCloud is similar to deploy

To set custom configurations for each, we recommend you use a separate configuration file and reference it in the BentoML CLI command or Python API for deployment.

The following is an example file that defines some custom configurations for the above two Services. You set configurations of each Service in the ``services`` field. Refer to :doc:`/bentocloud/how-tos/configure-deployments` to see the available configuration fields.
The following is an example file that defines some custom configurations for the above two Services. You set configurations of each Service in the ``services`` field. Refer to :doc:`/scale-with-bentocloud/deployment/configure-deployments` to see the available configuration fields.

.. code-block:: yaml
Expand Down
2 changes: 1 addition & 1 deletion docs/source/build-with-bentoml/observability/metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ BentoML automatically collects a set of default metrics for each Service. These
- ``request_in_progress``: The number of requests that are currently being processed by a Service.
- ``request_total``: The total number of requests that a Service has processed.
- ``request_duration_seconds``: The time taken to process requests, including the total sum of request processing time, count of requests processed, and distribution across specified duration buckets.
- ``adaptive_batch_size``: The adaptive batch sizes used during Service execution, which is relevant for optimizing performance in batch processing scenarios. You need to enable :doc:`adaptive batching </guides/adaptive-batching>` to collect this metric.
- ``adaptive_batch_size``: The adaptive batch sizes used during Service execution, which is relevant for optimizing performance in batch processing scenarios. You need to enable :doc:`adaptive batching </get-started/adaptive-batching>` to collect this metric.

Metric types
^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion docs/source/build-with-bentoml/parallelize-requests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ When you define a BentoML Service, use the ``workers`` parameter to set the numb
class MyService:
# Service implementation
The number of workers isn't necessarily equivalent to the number of concurrent requests a BentoML Service can serve in parallel. With optimizations like :doc:`adaptable batching </guides/adaptive-batching>` and continuous batching, each worker can potentially handle many requests simultaneously to enhance the throughput of your Service. To specify the ideal number of concurrent requests for a Service (namely, all workers within the Service), you can configure :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>`.
The number of workers isn't necessarily equivalent to the number of concurrent requests a BentoML Service can serve in parallel. With optimizations like :doc:`adaptable batching </get-started/adaptive-batching>` and continuous batching, each worker can potentially handle many requests simultaneously to enhance the throughput of your Service. To specify the ideal number of concurrent requests for a Service (namely, all workers within the Service), you can configure :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>`.

Use cases
---------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples/controlnet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ Create BentoML :doc:`Services </build-with-bentoml/services>` in a ``service.py`
controlnet_conditioning_scale: float = 0.5
num_inference_steps: int = 25
This file defines a BentoML Service ``ControlNet`` with custom :doc:`configurations </guides/configurations>` in timeout, worker count, and resources.
This file defines a BentoML Service ``ControlNet`` with custom :doc:`configurations </reference/bentoml/configurations>` in timeout, worker count, and resources.

- It loads the three pre-trained models and configures them to use GPU if available. The main pipeline (``StableDiffusionXLControlNetPipeline``) integrates these models.
- It defines an asynchronous API endpoint ``generate``, which takes an image and a set of parameters as input. The parameters for the generation process are extracted from a ``Params`` instance, a Pydantic model that provides automatic data validation.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples/function-calling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ The ``service.py`` file outlines the logic of the two required BentoML Services.
2. Create a Python class (``Llama`` in the example) to initialize the model and tokenizer, and use the following decorators to add BentoML functionalities.

- ``@bentoml.service``: Converts this class into a BentoML Service. You can optionally set :doc:`configurations </guides/configurations>` like timeout and GPU resources to use on BentoCloud. We recommend you use an NVIDIA A100 GPU of 80 GB for optimal performance.
- ``@bentoml.service``: Converts this class into a BentoML Service. You can optionally set :doc:`configurations </reference/bentoml/configurations>` like timeout and GPU resources to use on BentoCloud. We recommend you use an NVIDIA A100 GPU of 80 GB for optimal performance.
- ``@bentoml.mount_asgi_app``: Mounts an `existing ASGI application <https://github.com/bentoml/BentoFunctionCalling/blob/main/openai_endpoints.py>`_ defined in the ``openai_endpoints.py`` file to this class. It sets the base path to ``/v1``, making it accessible via HTTP requests. The mounted ASGI application provides OpenAI-compatible APIs and can be served side-by-side with the LLM Service. For more information, see :doc:`/build-with-bentoml/asgi`.

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples/langgraph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ service.py

The ``service.py`` file defines the ``SearchAgentService``, a BentoML Service that wraps around the LangGraph agent and calls the ``MistralService``.

1. Create a Python class and decorate it with ``@bentoml.service``, which transforms it into a BentoML Service. You can optionally set :doc:`configurations </guides/configurations>` like :doc:`workers </build-with-bentoml/parallelize-requests>` and :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>`.
1. Create a Python class and decorate it with ``@bentoml.service``, which transforms it into a BentoML Service. You can optionally set :doc:`configurations </reference/bentoml/configurations>` like :doc:`workers </build-with-bentoml/parallelize-requests>` and :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>`.

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples/mlflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ Create a separate ``service.py`` file where you define a BentoML :doc:`Service <
The Service code:

- Uses the ``@bentoml.service`` decorator to define a BentoML Service. Optionally, you can set additional :doc:`configurations </guides/configurations>` like resource allocation and traffic timeout.
- Uses the ``@bentoml.service`` decorator to define a BentoML Service. Optionally, you can set additional :doc:`configurations </reference/bentoml/configurations>` like resource allocation and traffic timeout.
- Retrieves the model from the Model Store and defines it a class variable.
- Uses the ``@bentoml.api`` decorator to expose the ``predict`` function as an API endpoint, which :doc:`takes a NumPy array as input and returns a NumPy array </build-with-bentoml/iotypes>`.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples/shieldgemma.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ The ``service.py`` file outlines the logic of the two required BentoML Services.
2. Create the ``Gemma`` Service to initialize the model and tokenizer, with a safety check API to calculate the probability of policy violation.

- The ``Gemma`` class is decorated with ``@bentoml.service``, which converts it into a BentoML Service. You can optionally set :doc:`configurations </guides/configurations>` like timeout, :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>` and GPU resources to use on BentoCloud. We recommend you use an NVIDIA T4 GPU to host ShieldGemma 2B.
- The ``Gemma`` class is decorated with ``@bentoml.service``, which converts it into a BentoML Service. You can optionally set :doc:`configurations </reference/bentoml/configurations>` like timeout, :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>` and GPU resources to use on BentoCloud. We recommend you use an NVIDIA T4 GPU to host ShieldGemma 2B.
- The API ``check``, decorated with ``@bentoml.api``, functions as a web API endpoint. It evaluates the safety of prompts by predicting the likelihood of a policy violation. It then returns a structured response using the ``ShieldResponse`` Pydantic model.

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/scale-with-bentocloud/scaling/autoscaling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ In general, the autoscaler will scale the number of replicas based on the follow
Key points about concurrency:

- By default, BentoML does not impose a limit on ``concurrency`` to avoid bottlenecks. To determine the optimal value for ``concurrency``, we recommend conducting a stress test on your Service using a load generation tool such as `Locust <https://locust.io/>`_ either locally or on BentoCloud. The purpose of the stress test is to identify the maximum number of concurrent requests your Service can manage. After identifying this maximum, set the concurrency parameter to a value slightly below this threshold ensuring that the Service has adequate headroom to handle traffic fluctuations.
- If your Service supports :doc:`adaptive batching </guides/adaptive-batching>` or continuous batching, set ``concurrency`` to match the batch size. This aligns processing capacity with batch requirements, optimizing throughput.
- If your Service supports :doc:`adaptive batching </get-started/adaptive-batching>` or continuous batching, set ``concurrency`` to match the batch size. This aligns processing capacity with batch requirements, optimizing throughput.
- For Services designed to handle one request at a time, set ``concurrency`` to ``1``, ensuring that requests are processed sequentially without overlap.

External queue
Expand Down

0 comments on commit 430328a

Please sign in to comment.