diff --git a/azure/ConsiderationsForServiceDesign.md b/azure/ConsiderationsForServiceDesign.md index c9da5f42..c2d48501 100644 --- a/azure/ConsiderationsForServiceDesign.md +++ b/azure/ConsiderationsForServiceDesign.md @@ -6,6 +6,7 @@ | Date | Notes | | ----------- | -------------------------------------------------------------- | +| 2024-Mar-17 | Updated LRO guidelines | | 2024-Jan-17 | Added guidelines on returning string offsets & lengths | | 2022-Jul-15 | Update guidance on long-running operations | | 2022-Feb-01 | Updated error guidance | @@ -207,7 +208,7 @@ It is good practice to define the path for action operations that is easily dist 2) use a special character not in the set of valid characters for resource names to distinguish the "action" in the path. In Azure we recommend distinguishing action operations by appending a ':' followed by an action verb to the final path segment. E.g. -```http +```text https://...//:? ``` @@ -216,7 +217,7 @@ cannot collide with a resource path that contains user-specified resource ids. ## Long-Running Operations -Long-running operations are an API design pattern that should be used when the processing of +Long-running operations (LROs) are an API design pattern that should be used when the processing of an operation may take a significant amount of time -- longer than a client will want to block waiting for the result. @@ -225,16 +226,95 @@ a _status monitor_, which is an ephemeral resource that will track the status an The status monitor resource is distinct from the target resource (if any) and specific to the individual operation request. -A POST or DELETE operation returns a `202 Accepted` response with the status monitor in the response body. -A long-running POST should not be used for resource create -- use PUT as described below. -PATCH must never be used for long-running operations -- it should be reserved for simple resource updates. -If a long-running update is required it should be implemented with POST. +There are four types of LROs allowed in Azure REST APIs: + +1. An LRO to create or replace a resource that involves additional long-running processing. +2. An LRO to delete a resource. +3. An LRO to perform an action on or with an existing resource (or resource collection). +4. An LRO to perform an action not related to an existing resource (or resource collection). + +The following sections describe these patterns in detail. + +### Create or replace a resource requiring additional long-running processing + + +A special case of long-running operations that occurs often is a PUT operation to create or replace a resource +that involves some additional long-running processing. +One example is a resource that requires physical resources (e.g. servers) to be "provisioned" to make the resource functional. + +In this case: +- The operation must use the PUT method (NOTE: PATCH is never allowed here) +- The URL identifies the resource being created or replaced. +- The request and response body have identical schemas & represent the resource. +- The request may contain an `Operation-Id` header that the service will use as +the ID of the status monitor created for the operation. +- If the `Operation-Id` matches an existing operation and the request content is the same, +treat as a retry and return the same response as the earlier request. +Otherwise fail the request with a `409-Conflict`. + +```text +PUT /items/FooBar&api-version=2022-05-01 +Operation-Id: 22 + +{ + "prop1": 555, + "prop2": "something" +} +``` -There is a special form of long-running operation initiated with PUT that is described -in [Create (PUT) with additional long-running processing](./Guidelines.md#put-operation-with-additional-long-running-processing). -The remainder of this section describes the pattern for long-running POST and DELETE operations. +In this case the response to the initial request is a `201 Created` to indicate that +the resource has been created or `200 OK` when the resource was replaced. +The response body should be a representation of the resource that was created, +and should include a `status` field indicating the current status of the resource. +A status monitor is created to track the additional processing and the ID of the status monitor +is returned in the `Operation-Id` header of the response. +The response must also include an `Operation-Location` header for backward compatibility. +If the resource supports ETags, the response may contain an `etag` header and possibly an `etag` property in the resource. + +```text +HTTP/1.1 201 Created +Operation-Id: 22 +Operation-Location: https://items/operations/22 +etag: "123abc" + +{ + "id": "FooBar", + "status": "Provisioning", + "prop1": 555, + "prop2": "something", + "etag": "123abc" +} +``` -This diagram illustrates how a long-running operation with a status monitor is initiated and then how the client +The client will issue a GET to the status monitor to obtain the status of the operation performing the additional processing. + +```text +GET https://items/operations/22?api-version=2022-05-01 +``` + +When the additional processing completes, the status monitor indicates if it succeeded or failed. + +```text +HTTP/1.1 200 OK + +{ + "id": "22", + "status": "Succeeded" +} +``` + +If the additional processing failed, the service may delete the original resource if it is not usable in this state, +but should clearly document this behavior. + +### Long-running delete operation + +A long-running delete operation returns a `202 Accepted` with a status monitor which the client uses to determine the outcome of the delete. + +The resource being deleted should remain visible (returned from a GET) until the delete operation completes successfully. + +When the delete operation completes successfully, a client must be able to create a new resource with the same name without conflicts. + +This diagram illustrates how a long-running DELETE operation is initiated and then how the client determines it has completed and obtains its results: ```mermaid @@ -242,7 +322,7 @@ sequenceDiagram participant Client participant API Endpoint participant Status Monitor - Client->>API Endpoint: POST/DELETE + Client->>API Endpoint: DELETE API Endpoint->>Client: HTTP/1.1 202 Accepted
{ "id": "22", "status": "NotStarted" } Client->>Status Monitor: GET Status Monitor->>Client: HTTP/1.1 200 OK
Retry-After: 5
{ "id": "22", "status": "Running" } @@ -250,8 +330,7 @@ sequenceDiagram Status Monitor->>Client: HTTP/1.1 200 OK
{ "id": "22", "status": "Succeeded" } ``` -1. The client sends the request to initiate the long-running operation. -The initial request could be a POST or DELETE method. +1. The client sends the request to initiate the long-running DELETE operation. The request may contain an `Operation-Id` header that the service uses as the ID of the status monitor created for the operation. 2. The service validates the request and initiates the operation processing. @@ -260,8 +339,8 @@ Otherwise the service responds with a `202-Accepted` HTTP status code. The response body is the status monitor for the operation including the ID, either from the request header or generated by the service. When returning a status monitor whose status is not in a terminal state, the response must also include a `retry-after` header indicating the minimum number of seconds the client should wait before polling (GETing) the status monitor URL again for an update. -For backward compatibility, the response may also include an `Operation-Location` header containing the absolute URL -of the status monitor resource (without an api-version query parameter). +For backward compatibility, the response must also include an `Operation-Location` header containing the absolute URL +of the status monitor resource, including an api-version query parameter. 3. After waiting at least the amount of time specified by the previous response's `Retry-after` header, the client issues a GET request to the status monitor using the ID in the body of the initial response. @@ -274,14 +353,11 @@ If the operation is still being processed, the status field will contain a "non- 5. After the operation processing completes, a GET request to the status monitor returns the status monitor with a status field set to a terminal value -- `Succeeded`, `Failed`, or `Canceled` -- that indicates the result of the operation. If the status is `Failed`, the status monitor resource contains an `error` field with a `code` and `message` that describes the failure. -If the status is `Succeeded` and the LRO is an Action operation, the operation results will be returned in the `result` field of the status monitor. -If the status is `Succeeded` and the LRO is an operation on a resource, the client can perform a GET on the resource -to observe the result of the operation if desired. -6. There may be some cases where a long-running operation can be completed before the response to the initial request. +6. There may be some cases where a long-running DELETE operation can be completed before the response to the initial request. In these cases, the operation should still return a `202 Accepted` with the `status` property set to the appropriate terminal state. -7. The service is responsible for purging the status-monitor resource. +7. The service is responsible for purging the status monitor resource. It should auto-purge the status monitor resource after completion (at least 24 hours). The service may offer DELETE of the status monitor resource due to GDPR/privacy. @@ -291,6 +367,9 @@ An action operation that is also long-running combines the [Action Operations](# with the [Long Running Operations](#long-running-operations) pattern. The operation is initiated with a POST operation and the operation path ends in `:`. +A long-running POST should not be used for resource create: use PUT as described above. +PATCH must never be used for long-running operations: it should be reserved for simple resource updates. +If a long-running update is required it should be implemented with POST. ```text POST /:?api-version=2022-05-01 @@ -302,7 +381,7 @@ Operation-Id: 22 } ``` -The response is a `202 Accepted` as described above. +A long-running action operation returns a `202 Accepted` response with the status monitor in the response body. ```text HTTP/1.1 202 Accepted @@ -332,74 +411,87 @@ HTTP/1.1 200 OK } ``` -### PUT with additional long-running processing +This diagram illustrates how a long-running action operation is initiated and then how the client +determines it has completed and obtains its results: -A special case of long-running operation that occurs often is a PUT operation to create or replace a resource -that involves some additional long-running processing. -One example is a resource requires physical resources (e.g. servers) to be "provisioned" to make the resource functional. -In this case, the request may contain an `Operation-Id` header that the service will use as -the ID of the status monitor created for the operation. +```mermaid +sequenceDiagram + participant Client + participant API Endpoint + participant Status Monitor + Client->>API Endpoint: POST + API Endpoint->>Client: HTTP/1.1 202 Accepted
{ "id": "22", "status": "NotStarted" } + Client->>Status Monitor: GET + Status Monitor->>Client: HTTP/1.1 200 OK
Retry-After: 5
{ "id": "22", "status": "Running" } + Client->>Status Monitor: GET + Status Monitor->>Client: HTTP/1.1 200 OK
{ "id": "22", "status": "Succeeded", "result": { ... } } +``` -```text -PUT /items/FooBar&api-version=2022-05-01 -Operation-Id: 22 +1. The client sends the request to initiate the long-running action operation. +The request may contain an `Operation-Id` header that the service uses as the ID of the status monitor created for the operation. -{ - "prop1": 555, - "prop2": "something" -} -``` +2. The service validates the request and initiates the operation processing. +If there are any problems with the request, the service responds with a `4xx` status code and error response body. +Otherwise the service responds with a `202-Accepted` HTTP status code. +The response body is the status monitor for the operation including the ID, either from the request header or generated by the service. +When returning a status monitor whose status is not in a terminal state, the response must also include a `retry-after` header indicating the minimum number of seconds the client should wait +before polling (GETing) the status monitor URL again for an update. +For backward compatibility, the response may also include an `Operation-Location` header containing the absolute URL +of the status monitor resource, including an api-version query parameter. -In this case the response to the initial request is a `201 Created` to indicate that the resource has been created -or `200 OK` when the resource was replaced. -The response body contains a representation of the created resource, which is the standard pattern for a create operation. -A status monitor is created to track the additional processing and the ID of the status monitor -is returned in the `Operation-Id` header of the response. -The response may also include an `Operation-Location` header for backward compatibility. -If the resource supports ETags, the response may contain an `etag` header and possibly an `etag` property in the resource. +3. After waiting at least the amount of time specified by the previous response's `Retry-after` header, +the client issues a GET request to the status monitor using the ID in the body of the initial response. +The GET operation for the status monitor is documented in the REST API definition and the ID +is the last URL path segment. -```text -HTTP/1.1 201 Created -Operation-Id: 22 -Operation-Location: https://items/operations/22 -etag: "123abc" +4. The status monitor responds with information about the operation including its current status, +which should be represented as one of a fixed set of string values in a field named `status`. +If the operation is still being processed, the status field will contain a "non-terminal" value, like `NotStarted` or `Running`. -{ - "id": "FooBar", - "etag": "123abc", - "prop1": 555, - "prop2": "something" -} -``` +5. After the operation processing completes, a GET request to the status monitor returns the status monitor with a status field set to a terminal value -- `Succeeded`, `Failed`, or `Canceled` -- that indicates the result of the operation. +If the status is `Failed`, the status monitor resource contains an `error` field with a `code` and `message` that describes the failure. +If the status is `Succeeded`, the operation results (if any) are returned in the `result` field of the status monitor. -The client will issue a GET to the status monitor to obtain the status of the operation performing the additional processing. +6. There may be some cases where a long-running action operation can be completed before the response to the initial request. +In these cases, the operation should still return a `202 Accepted` with the `status` property set to the appropriate terminal state. -```text -GET https://items/operations/22?api-version=2022-05-01 -``` +7. The service is responsible for purging the status monitor resource. +It should auto-purge the status monitor resource after completion (at least 24 hours). +The service may offer DELETE of the status monitor resource due to GDPR/privacy. -When the additional processing completes, the status monitor will indicate if it succeeded or failed. +### Long-running action operation not related to a resource -```text -HTTP/1.1 200 OK +When a long-running action operation is not related to a specific resource (a batch operation is one example), +another approach is needed. -{ - "id": "22", - "status": "Succeeded" -} -``` +This type of LRO should be initiated with a PUT method on a URL that represents the operation to be performed, +and includes a final path parameter for the user-specified operation ID. +The response of the PUT includes a response body containing a representation of the status monitor for the operation +and an `Operation-Location` response header that contains the absolute URL of the status monitor. +In this type of LRO, the status monitor should include any information from the request used to initiate the operation, +so that a failed operation could be reissued if necessary. -If the additional processing failed, the service may delete the original resource if it is not usable in this state, -but would have to clearly document this behavior. +Clients will use a GET on the status monitor URL to obtain the status and results of the operation. +Since the HTTP semantic for PUT is to create a resource, the same schema should be used for the PUT request body, +the PUT response body, and the response body of the GET for the status monitor for the operation. +For this type of LRO, the status monitor URL should be the same URL as the PUT operation. -### Long-running delete operation +The following examples illustrate this pattern. -A long-running delete operation follows the general pattern of a long-running operation -- -it returns a `202 Accepted` with a status monitor which the client uses to determine the outcome of the delete. +```text +PUT /translate-operations/?api-version=2022-05-01 -The resource being deleted should remain visible (returned from a GET) until the delete operation completes successfully. + +``` + +Note that the client specifies the operation id in the URL path. -When the delete operation completes successfully, a client must be able to create new resource with same name without conflicts. +A successful response to the PUT operation should have a `201 Created` status and response body +that contains a representation of the status monitor _and_ any information from the request used to initiate the operation. + +The service is responsible for purging the status monitor after some period of time, +but no earlier than 24 hours after the completion of the operation. +The service may offer DELETE of the status monitor resource due to GDPR/privacy. ### Controlling a long-running operation @@ -407,7 +499,7 @@ It might be necessary to support some control action on a long-running operation This is implemented as a POST on the status monitor endpoint with `:` added. ```text -POST /:cancel?api-version=2022-05-01 +POST /:cancel?api-version=2022-05-01 ``` A successful response to a control operation should be a `200 OK` with a representation of the status monitor. diff --git a/azure/Guidelines.md b/azure/Guidelines.md index 70c99a43..8785ff89 100644 --- a/azure/Guidelines.md +++ b/azure/Guidelines.md @@ -1,7 +1,7 @@ # Microsoft Azure REST API Guidelines - + + +:white_check_mark: **DO** use the following pattern when implementing an LRO action operating on an existing resource: + +```text +POST /UrlToExistingResource:?api-version=& +operation-id: ` + + +``` + +The response must look like this: + +```text +202 Accepted +operation-id: +operation-location: https://operations/ + + +``` + +The request body contains information to be used to execute the action. + +For an idempotent POST (same `operation-id` and request body within some short time window), the service should return the same response as the initial request. + +For a non-idempotent POST, the service can treat the POST operation as idempotent (if performed within a short time window) or can treat the POST operation as initiating a brand new LRO action operation. + +:no_entry: **DO NOT** use a long-running POST to create a resource -- use PUT as described above. + +:white_check_mark: **DO** allow the client to pass an `Operation-Id` header with an ID for the operation's status monitor. + +:white_check_mark: **DO** generate an ID (typically a GUID) for the status monitor if the `Operation-Id` header was not passed by the client. + +:white_check_mark: **DO** fail a request with a `409-Conflict` if the `Operation-Id` header matches an existing operation unless the request is identical to the prior request (a retry scenario). + +:white_check_mark: **DO** return a `202-Accepted` status code from the request that initiates an LRO action on a resource if the processing of the operation was successfully initiated. + +:warning: **YOU SHOULD NOT** return any other `2xx` status code from the initial request of an LRO -- return `202-Accepted` and a status monitor even if processing was completed before the initiating request returns. + +:white_check_mark: **DO** return a status monitor in the response body as described in [Obtaining status and results of long-running operations](#obtaining-status-and-results-of-long-running-operations). + +#### LRO action with no related resource pattern -**OperationStatus** : Object +:white_check_mark: **DO** use the following pattern when implementing an LRO action not related to a specific resource (such as a batch operation): + +```text +PUT /?api-version= + +> +``` + +The response must look like this: + +```text +201 Created +operation-location: + + +``` + +:ballot_box_with_check: **YOU SHOULD** +define a unique operation endpoint for each LRO action with no related resource. + +:white_check_mark: **DO** require the +`Operation-Id` as the final path segment in the URL. + +Note: The `operation-id` URL segment (not header) is *required*, forcing the client to specify the status monitor's resource ID +and is also used for retries/idempotency. + +:white_check_mark: **DO** return a `201 Created` status code +with an `operation-location` response header if the LRO Action operation was accepted for processing. + +:white_check_mark: **DO** return a +status monitor in the response body that contains the operation status, request parameters, and when the operation completes either +the operation result or error. + +Note: Since all request parameters must be present in the status monitor, +the request and response body of the PUT can be defined with a single schema. + +:ballot_box_with_check: **YOU SHOULD** +return the status monitor for an operation for a subsequent GET on the URL that initiates the LRO, and use this endpoint as +the status monitor URL returned in the `operation-location` response header. + +#### The Status Monitor Resource + +All patterns that initiate a LRO either implicitly or explicitly create a [Status Monitor resource](https://datatracker.ietf.org/doc/html/rfc7231#section-6.3.3) in the service's `operations` collection. + +:white_check_mark: **DO** return a status monitor in the response body that conforms with the following structure: Property | Type | Required | Description -------- | ----------- | :------: | ----------- `id` | string | true | The unique id of the operation -`status` | string | true | enum that includes values "NotStarted", "Running", "Succeeded", "Failed", and "Canceled" -`error` | ErrorDetail | | Error object that describes the error when status is "Failed" -`result` | object | | Only for POST action-type LRO, the results of the operation when completed successfully -additional
properties | | | Additional named or dynamic properties of the operation +`kind` | string enum | true(*) | The kind of operation +`status` | string enum | true | The operation's current status: "NotStarted", "Running", "Succeeded", "Failed", and "Canceled" +`error` | ErrorDetail | | If `status`=="Failed", contains reason for failure +`result` | object | | If `status`=="Succeeded" && Action LRO (POST or PUT), contains success result if needed +additional
properties | | | Additional named or dynamic properties of the operation -:white_check_mark: **DO** include the `id` of the operation and any other values needed for the client to form a GET request to the status monitor (e.g. a `location` path parameter). +(*): When a status monitor endpoint supports multiple operations with different result structures or additional properties, +the status monitor **must** be polymorphic -- it **must** contain a required `kind` property that indicates the kind of long-running operation. + +#### Obtaining status and results of long-running operations -:white_check_mark: **DO** include a `Retry-After` header in the response to GET requests to the status monitor if the operation is not complete. The value of this header should be an integer number of seconds to wait before making the next request to the status monitor. +:white_check_mark: **DO** use the following pattern to allow clients to poll the current state of a Status Monitor resource: + +```text +GET /?api-version= +``` + +The response must look like this: + +```text +200 OK +retry-after: (if status not terminal) + + +``` + +:white_check_mark: **DO** support the GET method on the status monitor endpoint that returns a `200-OK` response with the current state of the status monitor. + +:ballot_box_with_check: **YOU SHOULD** allow any valid value of the `api-version` query parameter to be used in the GET operation on the status monitor. + +- Note: Clients may replace the value of `api-version` in the `operation-location` URL with a value appropriate for their application. Remember that the client initiating the LRO may not be the same client polling the LRO's status. + +:white_check_mark: **DO** include the `id` of the operation and any other values needed for the client to form a GET request to the status monitor (e.g. a `location` path parameter). :white_check_mark: **DO** include the `result` property (if any) in the status monitor for a POST action-type long-running operation when the operation completes successfully. -:no_entry: **DO NOT** include a `result` property in the status monitor for a long-running operation that is not a POST action-type long-running operation. +:no_entry: **DO NOT** include a `result` property in the status monitor for a long-running operation that is not an action-type long-running operation. + +:white_check_mark: **DO** include a `retry-after` header in the response if the operation is not complete. The value of this header should be an integer number of seconds that the client should wait before polling the status monitor again. :white_check_mark: **DO** retain the status monitor resource for some publicly documented period of time (at least 24 hours) after the operation completes. +#### Pattern to List Status Monitors + +Use the following patterns to allow clients to list Status Monitor resources. + +:ballot_box_with_check: +**YOU MAY** support a GET method on any status monitor collection URL that returns a list of the status monitors in that collection. + +:ballot_box_with_check: +**YOU SHOULD** support a list operation for any status monitor collection that includes status monitors for LRO Actions with no related resource. + +:ballot_box_with_check: +**YOU SHOULD** support the `filter` query parameter on the list operation for any polymorphic status monitor collection and support filtering on the `kind` value of the status monitor. + +For example, the following request should return all status monitor resources whose `kind` is either "VMInitializing" *or* "VMRebooting" +and whose status is "NotStarted" *or* "Succeeded". + +```text +GET /operations?filter=(kind eq 'VMInitializing' or kind eq 'VMRebooting') and (status eq 'NotStarted' or status eq 'Succeeded') +``` + ### Bring your own Storage (BYOS) + Many services need to store and retrieve data files. For this scenario, the service should not implement its own storage APIs and should instead leverage the existing Azure Storage service. When doing this, the customer "owns" the storage account and just tells your service to use it. Colloquially, we call this Bring Your Own Storage as the customer is bringing their storage account to another service. BYOS provides significant benefits to service implementors: security, performance, uptime, etc. And, of course, most Azure customers are already familiar with the Azure Storage service. @@ -958,7 +1108,7 @@ While Azure Managed Storage may be easier to get started with, as your service e :white_check_mark: **DO** use the Bring Your Own Storage pattern. -:white_check_mark: **DO** use a blob prefix for a logical folder (avoid terms such as ```directory```, ```folder```, or ```path```). +:white_check_mark: **DO** use a blob prefix for a logical folder (avoid terms such as `directory`, `folder`, or `path`). :no_entry: **DO NOT** require a fresh container per operation. @@ -1120,6 +1270,7 @@ See the [Returning String Offsets & Lengths] section in Considerations for Servi ### Distributed Tracing & Telemetry + Azure SDK client guidelines specify that client libraries must send telemetry data through the `User-Agent` header, `X-MS-UserAgent` header, and Open Telemetry. Client libraries are required to send telemetry and distributed tracing information on every request. Telemetry information is vital to the effective operation of your service and should be a consideration from the outset of design and implementation efforts.