-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add guidelines on returning string offsets & lengths #521
Conversation
"offset": { | ||
"utf8": 12, | ||
"utf16": 10, | ||
"codePoint": 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, we seems got 2 spaces here "codePoint": 4
@@ -515,6 +515,61 @@ For example, the client can specify an `If-Match` header with the last ETag valu | |||
The service processes the update only if the ETag value in the header matches the ETag of the current resource on the server. | |||
By computing and returning ETags for your resources, you enable clients to avoid using a strategy where the "last write always wins." | |||
|
|||
## Returning String Offsets & Lengths (Substrings) | |||
|
|||
Some Azure services return substring offset & length values within a string. For example, the offset & length within a string to a name, email address, or phone #. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit phone #
seems too informal? Just phone number
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions, but otherwise LGTM.
| UTF-16 | JavaScript, Java, C# | | ||
| CodePoint (UTF-32) | Python | | ||
|
||
Because the service doesn't know what language a client is written in and what string encoding that language uses, the service can't return UTF-agnostic offset and length values that the client can use to index within the string. To address this, the service response must include offset & length values for all 3 possible encodings and then the client code must select the encoding it required by its language's internal string encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar nit:
Because the service doesn't know what language a client is written in and what string encoding that language uses, the service can't return UTF-agnostic offset and length values that the client can use to index within the string. To address this, the service response must include offset & length values for all 3 possible encodings and then the client code must select the encoding it required by its language's internal string encoding. | |
Because the service doesn't know in what language a client is written and what string encoding that language uses, the service can't return UTF-agnostic offset and length values that the client can use to index within the string. To address this, the service response must include offset & length values for all 3 possible encodings and then the client code must select the encoding required by its language's internal string encoding. |
name := response.fullString[ response.name.offset.utf8 : response.name.offset.utf8 + response.name.length.utf8] | ||
``` | ||
|
||
The service must calculate the offset & length for all 3 encodings and return them because clients find it difficult working with Unicode encodings and how to convert from one encoding to another. In other words, we do this to simplify client development and ensure customer success when isolating a substring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also mention that it makes pass-through requests easier as well? That was the thing that really won me over. I think the same was true for @JeffreyRichter, IIRC.
All string values in JSON are inherently Unicode and UTF-8 encoded, but clients written in a high-level programming language must work with strings in that language's string encoding, which may be UTF-8, UTF-16, or CodePoints (UTF-32). | ||
When a service response includes a string offset or length value, it should specify these values in all 3 encodings to simplify client development and ensure customer success when isolating a substring. | ||
|
||
<a href="#substrings-return-value-for-each-encoding" name="substrings-return-value-for-each-encoding">:white_check_mark:</a> **DO** include all 3 encodings (UTF-8, UTF-16, and CodePoint) for every string offset or length value in a service response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should document here in this doc the exact format we want e.g., {"utf8": 2, "utf16": 1, "codePoint":1}
. We document formats for LROs, pageables, and errors. How you expanded on that in "Considerations" is perfect, but you should also link to that section e.g.,
<a href="#substrings-return-value-for-each-encoding" name="substrings-return-value-for-each-encoding">:white_check_mark:</a> **DO** include all 3 encodings (UTF-8, UTF-16, and CodePoint) for every string offset or length value in a service response. | |
<a href="#substrings-return-value-for-each-encoding" name="substrings-return-value-for-each-encoding">:white_check_mark:</a> **DO** include all 3 encodings (UTF-8, UTF-16, and CodePoint) for every string offset or length value in a service response using the schema below. See [considerations](ConsiderationsForServiceDesign.md#{actual-stub-here}) for more information. | |
```json | |
{ | |
"length": { | |
"utf8": 2, | |
"utf16": 1, | |
"codePoint": 1 | |
} | |
} | |
``` |
This PR splits out the update for string offset and length from #517. I also reworked things a bit by moving the explanatory content over to ConsiderationsForServiceDesign.
It looks like my editor also trimmed some trailing whitespace from otherwise unchanged lines.