Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SearchRequest/Response proto and schema proto #15411

Open
wants to merge 3 commits into
base: 2.16
Choose a base branch
from

Conversation

amberzsy
Copy link

@amberzsy amberzsy commented Aug 26, 2024

Description

Generate protobuf for SearchRequest, Response and related schemas from https://github.com/andrross/opensearch-api-specification/blob/main/spec/namespaces/_core.yaml using https://github.com/OpenAPITools/openapi-generator/tree/master.

Related Issues

Resolves #15190

Check List

  • DSL coverage report - ~85.7%
    image
  • Response/Request Coverage - ~ 85%
    image

Issue:

  1. proto3 remove the optional/required, we need to figure out how to check if field/property is set.
    proposal:
    a) use optional.

with the optional keyword, proto3 generates code that allows you to check whether a field has been set, which can be useful for validating fields’ presence. requires 3.15+ version.

b) use oneOf
c) use customized wrapper

  1. Gaps between https://opensearch.org/docs/latest/api-reference/search/ and https://github.com/andrross/opensearch-api-specification/tree/main.
    a. Numeric type: number defined in specs but Integer in public docs. we might need to further specify numeric in api-specs / protobuf
    b. Request body has bunch of new properties but not mentioned in https://opensearch.org/docs/latest/api-reference/search/.

  2. We need to have criteria on muti-types as below.

e.g.  type: 
        - boolean
        - 'null'
        - number
        - object
        - string
and 
type: ['null', number, string]

in this PR, i use ... (open to any suggestions)

message MultiTypeValue{
  oneof value{
    GeneralNumber number = 1;
    string string_value = 2;
    bytes bytes_value = 3;
    bool bool_value = 4;
  }
}
  1. there are many places defined with oneOf Object and Array of Object.
    e.g
          knn:
                description: Defines the approximate kNN search to run.
                oneOf:
                  - $ref: '../schemas/_common.yaml#/components/schemas/KnnQuery'
                  - type: array
                    items:
                      $ref: '../schemas/_common.yaml#/components/schemas/KnnQuery'
  

In this PR, all consolidated to

- type: array
                    items:
                      $ref: '../schemas/_common.yaml#/components/schemas/XYZ'

openapi-generator Tooling

  1. Does not recognize characters like ::, ‘@’
  2. proto number doesn’t start from 1; only enum follow format from 1;
  3. can not handle multiple property types.
  4. incomplete handling on AllOf, it will have some missing properties.
  5. It would generates individual protobuf files for each properties. Manually merge required. Esp, for DSL which has recursive definition, it cannot automatically combine the protobuf files into one.

Overall, openapi-generator can relative accurately generate 70% of probuf but manual validation and adjustment is required which is time-consuming based on this scale of API specs.

TODO: we need to further enhance the tooling to be able to support api-specs to protobuf translation.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Roadmap:Search Project-wide roadmap label Search Search query, autocomplete ...etc Search:Performance labels Aug 26, 2024
Signed-off-by: Shuyi Zhang <[email protected]>
Signed-off-by: amberzsy <[email protected]>
@amberzsy amberzsy force-pushed the 15190-grpc-search-api branch from bcf6693 to e60462e Compare August 26, 2024 06:36
Copy link
Contributor

❌ Gradle check result for bcf6693: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for e60462e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: amberzsy <[email protected]>
Signed-off-by: amberzsy <[email protected]>
Copy link
Contributor

❌ Gradle check result for 2bbdb8d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for bd9c7c2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@dblock
Copy link
Member

dblock commented Aug 26, 2024

@amberzsy

Couple of thoughts:

  1. I think we want to generate these from the complete spec, not a partial namespace one, e.g. https://github.com/opensearch-project/opensearch-api-specification/releases/download/main-latest/opensearch-openapi.yaml (a release).
  2. It might make sense to add the workflow that generates proto files as a pipeline in https://github.com/opensearch-project/opensearch-api-specification so that we can "release" them the same way as the complete spec. I opened [PROPOSAL] Publish protobuf files. opensearch-api-specification#532.
  3. IMO to merge to OpenSearch core we want to have a gradle task that pulls the latest spec and generates these files, or fetches them if we do (2).

WDYT?

@amberzsy
Copy link
Author

2. It might make sense to add the workflow that generates proto files as a pipeline in https://github.com/opensearch-project/opensearch-api-specification so that we can "release" them the same way as the complete spec.

yes, technically, we definitely should have this automated and have it as part of build process. i can maybe have simple version matchAll query and related schema to unblock on grpc poc and remove all the rest of proto def.
However, as i mentioned in description. there's some ambiguity for specs -> proto. which i think we need to align on certain criteria.
Another question is should we do feature development on top of https://github.com/OpenAPITools/openapi-generator (e.g have RFC and contribute) or we want to have it and develop new/similar in opensearch-build

@dblock
Copy link
Member

dblock commented Aug 26, 2024

yes, technically, we definitely should have this automated and have it as part of build process. i can maybe have simple version matchAll query and related schema to unblock on grpc poc and remove all the rest of proto def.

Probably best.

However, as i mentioned in description. there's some ambiguity for specs -> proto. which i think we need to align on certain criteria.

That's why I'd start automating it right away. We likely will need to incrementally fix the spec and the pipeline, add linters to avoid common mistakes in the spec that lead to something we can't convert, etc.

Another question is should we do feature development on top of https://github.com/OpenAPITools/openapi-generator (e.g have RFC and contribute) or we want to have it and develop new/similar in opensearch-build.

I personally don't have a preference. You could write a brand new converter in https://github.com/opensearch-project/opensearch-api-specification, we already parse the spec and do a lot of things to it.

@andrross
Copy link
Member

Thanks @amberzsy!

proto3 remove the optional/required, we need to figure out how to check if field/property is set

Your suggestion of using optional makes sense to me.

Gaps between https://opensearch.org/docs/latest/api-reference/search/ and https://github.com/andrross/opensearch-api-specification/tree/main.

Ultimately the code in this repository that implements the API is the source of truth. If the documentation and the api spec do not agree, then whatever is actually implemented is correct. The goal is to make the api spec match what is actually implemented (this is an ongoing manual process to build it out), then add tests that verify and enforce the actual API matches the spec. The documentation is built out in separate manual process as well so its not surprising that there are subtle differences. The goal is to also be able to generate the API documentation from the specification as well, but we're not there yet.

We need to have criteria on muti-types as below.

A custom message type (MultiTypeValue) wrapping a oneof construct like you showed makes sense to me.

there are many places defined with oneOf Object and Array of Object.

This seems like a quirk of JSON? Like "field": "value" and "field": ["value"] are different JSON but both just represent a single value. I'm guessing we do this so that the brackets are not always required in JSON when sending a single value. It seems like we could flatten this to a single repeated field in protobuf, or else do a oneof wrapping a single value and a repeated value. I don't have a strong opinion here. We should probably pick the simplest approach.

@amberzsy
Copy link
Author

sounds good. i'll update the pr for more grpc poc purpose (instead of proto defs) and we can have separate discussion for the proto/specs etc. I'll provide a doc with list of gaps and plans which will need opinion from you for clarification/alignment by end of tmr.

@dblock
Copy link
Member

dblock commented Aug 27, 2024

2. Gaps between https://opensearch.org/docs/latest/api-reference/search/ and https://github.com/andrross/opensearch-api-specification/tree/main.

The doc is incorrect/is missing information. We have opensearch-project/documentation-website#7700 that we plan to take on soon.

a. Numeric type: number defined in specs but Integer in public docs. we might need to further specify numeric in api-specs / protobuf

If the spec needs changes, please contribute/open issues in https://github.com/opensearch-project/opensearch-api-specification.

b. Request body has bunch of new properties but not mentioned in https://opensearch.org/docs/latest/api-reference/search/.

See above.

@amberzsy
Copy link
Author

This seems like a quirk of JSON? Like "field": "value" and "field": ["value"] are different JSON but both just represent a single value. I'm guessing we do this so that the brackets are not always required in JSON when sending a single value. It seems like we could flatten this to a single repeated field in protobuf, or else do a oneof wrapping a single value and a repeated value. I don't have a strong opinion here. We should probably pick the simplest approach.

probably go with single repeated field.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Roadmap:Search Project-wide roadmap label Search:Performance Search Search query, autocomplete ...etc stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants