docs: Fix links (#5122)

Fix links Signed-off-by: Sherlock113 <[email protected]>
bentoml · Dec 11, 2024 · 430328a · 430328a
1 parent 09c7e59
commit 430328a
Show file tree

Hide file tree

Showing 9 changed files with 9 additions and 9 deletions.
diff --git a/docs/source/build-with-bentoml/distributed-services.rst b/docs/source/build-with-bentoml/distributed-services.rst
@@ -232,7 +232,7 @@ Deploying a project with distributed Services to BentoCloud is similar to deploy
 
 To set custom configurations for each, we recommend you use a separate configuration file and reference it in the BentoML CLI command or Python API for deployment.
 
-The following is an example file that defines some custom configurations for the above two Services. You set configurations of each Service in the ``services`` field. Refer to :doc:`/bentocloud/how-tos/configure-deployments` to see the available configuration fields.
+The following is an example file that defines some custom configurations for the above two Services. You set configurations of each Service in the ``services`` field. Refer to :doc:`/scale-with-bentocloud/deployment/configure-deployments` to see the available configuration fields.
 
 .. code-block:: yaml
 

diff --git a/docs/source/build-with-bentoml/observability/metrics.rst b/docs/source/build-with-bentoml/observability/metrics.rst
@@ -44,7 +44,7 @@ BentoML automatically collects a set of default metrics for each Service. These
 - ``request_in_progress``: The number of requests that are currently being processed by a Service.
 - ``request_total``: The total number of requests that a Service has processed.
 - ``request_duration_seconds``: The time taken to process requests, including the total sum of request processing time, count of requests processed, and distribution across specified duration buckets.
-- ``adaptive_batch_size``: The adaptive batch sizes used during Service execution, which is relevant for optimizing performance in batch processing scenarios. You need to enable :doc:`adaptive batching </guides/adaptive-batching>` to collect this metric.
+- ``adaptive_batch_size``: The adaptive batch sizes used during Service execution, which is relevant for optimizing performance in batch processing scenarios. You need to enable :doc:`adaptive batching </get-started/adaptive-batching>` to collect this metric.
 
 Metric types
 ^^^^^^^^^^^^

diff --git a/docs/source/build-with-bentoml/parallelize-requests.rst b/docs/source/build-with-bentoml/parallelize-requests.rst
@@ -19,7 +19,7 @@ When you define a BentoML Service, use the ``workers`` parameter to set the numb
     class MyService:
         # Service implementation
 
-The number of workers isn't necessarily equivalent to the number of concurrent requests a BentoML Service can serve in parallel. With optimizations like :doc:`adaptable batching </guides/adaptive-batching>` and continuous batching, each worker can potentially handle many requests simultaneously to enhance the throughput of your Service. To specify the ideal number of concurrent requests for a Service (namely, all workers within the Service), you can configure :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>`.
+The number of workers isn't necessarily equivalent to the number of concurrent requests a BentoML Service can serve in parallel. With optimizations like :doc:`adaptable batching </get-started/adaptive-batching>` and continuous batching, each worker can potentially handle many requests simultaneously to enhance the throughput of your Service. To specify the ideal number of concurrent requests for a Service (namely, all workers within the Service), you can configure :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>`.
 
 Use cases
 ---------

diff --git a/docs/source/examples/controlnet.rst b/docs/source/examples/controlnet.rst
@@ -120,7 +120,7 @@ Create BentoML :doc:`Services </build-with-bentoml/services>` in a ``service.py`
         controlnet_conditioning_scale: float = 0.5
         num_inference_steps: int = 25
 
-This file defines a BentoML Service ``ControlNet`` with custom :doc:`configurations </guides/configurations>` in timeout, worker count, and resources.
+This file defines a BentoML Service ``ControlNet`` with custom :doc:`configurations </reference/bentoml/configurations>` in timeout, worker count, and resources.
 
 - It loads the three pre-trained models and configures them to use GPU if available. The main pipeline (``StableDiffusionXLControlNetPipeline``) integrates these models.
 - It defines an asynchronous API endpoint ``generate``, which takes an image and a set of parameters as input. The parameters for the generation process are extracted from a ``Params`` instance, a Pydantic model that provides automatic data validation.

diff --git a/docs/source/examples/function-calling.rst b/docs/source/examples/function-calling.rst
@@ -70,7 +70,7 @@ The ``service.py`` file outlines the logic of the two required BentoML Services.
 
 2. Create a Python class (``Llama`` in the example) to initialize the model and tokenizer, and use the following decorators to add BentoML functionalities.
 
-   - ``@bentoml.service``: Converts this class into a BentoML Service. You can optionally set :doc:`configurations </guides/configurations>` like timeout and GPU resources to use on BentoCloud. We recommend you use an NVIDIA A100 GPU of 80 GB for optimal performance.
+   - ``@bentoml.service``: Converts this class into a BentoML Service. You can optionally set :doc:`configurations </reference/bentoml/configurations>` like timeout and GPU resources to use on BentoCloud. We recommend you use an NVIDIA A100 GPU of 80 GB for optimal performance.
    - ``@bentoml.mount_asgi_app``: Mounts an `existing ASGI application <https://github.com/bentoml/BentoFunctionCalling/blob/main/openai_endpoints.py>`_ defined in the ``openai_endpoints.py`` file to this class. It sets the base path to ``/v1``, making it accessible via HTTP requests. The mounted ASGI application provides OpenAI-compatible APIs and can be served side-by-side with the LLM Service. For more information, see :doc:`/build-with-bentoml/asgi`.
 
    .. code-block:: python

diff --git a/docs/source/examples/langgraph.rst b/docs/source/examples/langgraph.rst
@@ -83,7 +83,7 @@ service.py
 
 The ``service.py`` file defines the ``SearchAgentService``, a BentoML Service that wraps around the LangGraph agent and calls the ``MistralService``.
 
-1. Create a Python class and decorate it with ``@bentoml.service``, which transforms it into a BentoML Service. You can optionally set :doc:`configurations </guides/configurations>` like :doc:`workers </build-with-bentoml/parallelize-requests>` and :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>`.
+1. Create a Python class and decorate it with ``@bentoml.service``, which transforms it into a BentoML Service. You can optionally set :doc:`configurations </reference/bentoml/configurations>` like :doc:`workers </build-with-bentoml/parallelize-requests>` and :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>`.
 
    .. code-block:: python
 

diff --git a/docs/source/examples/mlflow.rst b/docs/source/examples/mlflow.rst
@@ -110,7 +110,7 @@ Create a separate ``service.py`` file where you define a BentoML :doc:`Service <
 
 The Service code:
 
-- Uses the ``@bentoml.service`` decorator to define a BentoML Service. Optionally, you can set additional :doc:`configurations </guides/configurations>` like resource allocation and traffic timeout.
+- Uses the ``@bentoml.service`` decorator to define a BentoML Service. Optionally, you can set additional :doc:`configurations </reference/bentoml/configurations>` like resource allocation and traffic timeout.
 - Retrieves the model from the Model Store and defines it a class variable.
 - Uses the ``@bentoml.api`` decorator to expose the ``predict`` function as an API endpoint, which :doc:`takes a NumPy array as input and returns a NumPy array </build-with-bentoml/iotypes>`.
 

diff --git a/docs/source/examples/shieldgemma.rst b/docs/source/examples/shieldgemma.rst
@@ -67,7 +67,7 @@ The ``service.py`` file outlines the logic of the two required BentoML Services.
 
 2. Create the ``Gemma`` Service to initialize the model and tokenizer, with a safety check API to calculate the probability of policy violation.
 
-   - The ``Gemma`` class is decorated with ``@bentoml.service``, which converts it into a BentoML Service. You can optionally set :doc:`configurations </guides/configurations>` like timeout, :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>` and GPU resources to use on BentoCloud. We recommend you use an NVIDIA T4 GPU to host ShieldGemma 2B.
+   - The ``Gemma`` class is decorated with ``@bentoml.service``, which converts it into a BentoML Service. You can optionally set :doc:`configurations </reference/bentoml/configurations>` like timeout, :doc:`concurrency </scale-with-bentocloud/scaling/autoscaling>` and GPU resources to use on BentoCloud. We recommend you use an NVIDIA T4 GPU to host ShieldGemma 2B.
    - The API ``check``, decorated with ``@bentoml.api``, functions as a web API endpoint. It evaluates the safety of prompts by predicting the likelihood of a policy violation. It then returns a structured response using the ``ShieldResponse`` Pydantic model.
 
    .. code-block:: python

diff --git a/docs/source/scale-with-bentocloud/scaling/autoscaling.rst b/docs/source/scale-with-bentocloud/scaling/autoscaling.rst
@@ -56,7 +56,7 @@ In general, the autoscaler will scale the number of replicas based on the follow
 Key points about concurrency:
 
 - By default, BentoML does not impose a limit on ``concurrency`` to avoid bottlenecks. To determine the optimal value for ``concurrency``, we recommend conducting a stress test on your Service using a load generation tool such as `Locust <https://locust.io/>`_ either locally or on BentoCloud. The purpose of the stress test is to identify the maximum number of concurrent requests your Service can manage. After identifying this maximum, set the concurrency parameter to a value slightly below this threshold ensuring that the Service has adequate headroom to handle traffic fluctuations.
-- If your Service supports :doc:`adaptive batching </guides/adaptive-batching>` or continuous batching, set ``concurrency`` to match the batch size. This aligns processing capacity with batch requirements, optimizing throughput.
+- If your Service supports :doc:`adaptive batching </get-started/adaptive-batching>` or continuous batching, set ``concurrency`` to match the batch size. This aligns processing capacity with batch requirements, optimizing throughput.
 - For Services designed to handle one request at a time, set ``concurrency`` to ``1``, ensuring that requests are processed sequentially without overlap.
 
 External queue