Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 77 - Add Translation Feature #82

Merged

Conversation

tahmidefaz
Copy link
Contributor

Fixes #77.

Changes proposed in this PR:

  • Introduces a new parameter called resolve_method
  • When resolve_method is set to TRANSLATE_EXTERNAL, it copies the external dereferenced object over to components/schemas

Example

OpenAPI specification

{
  "openapi": "3.0.0",
  "info": {
    "title": "Test API",
    "version": "1"
  },
  "paths": {
    "/endpoint": {
      "post": {
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {"$ref": "_schemas.json#/$defs/Parameter"}
            }
          }
        },
        "responses": {}
      }
    }
  }
}

External file with schema definitions.

   {
     "$id": "_schemas.json",
     "$schemas": "https://json-schema.org/draft/2019-09/schema#",
     "$defs": {
       "Something": {
         "type": "object"
       },
       "Parameter": {
         "properties": {
           "something": {
             "$ref": "#/$defs/Something"
           }
         }
       }
     }
   }

Using RESOLVE_FILE | RESOLVE_INTERNAL and resolve_method=TRANSLATE_EXTERNAL this becomes:

{
  "openapi":"3.0.0",
  "info":{
    "title":"Test API",
    "version":"1"
  },
  "paths":{
    "/endpoint":{
      "post":{
        "requestBody":{
          "$ref":"#/components/schemas/_schemas.json_Body"
        },
        "responses":{
          
        }
      }
    }
  },
  "components":{
    "schemas":{
      "_schemas.json_Something":{
        "type":"object"
      },
      "_schemas.json_Body":{
        "content":{
          "application/json":{
            "$ref":"#/components/schemas/_schemas.json_Something"
          }
        }
      }
    }
  }
}

@jfinkhaeuser

@tahmidefaz tahmidefaz changed the title Issue 77 Issue 77 - Add Translation Feature Nov 21, 2020
@jfinkhaeuser
Copy link
Collaborator

I like it! I had a slightly different idea that would also deal with #76... but I can adjust that. /me thinks I may get to that today.

@tahmidefaz
Copy link
Contributor Author

Great! Let me know if you want me to update this PR in any way.

@Glutexo
Copy link
Collaborator

Glutexo commented Nov 23, 2020

I see some possible issues with the current implementation. Will post more in a review.

@tahmidefaz
Copy link
Contributor Author

@Glutexo do you see any possible issues with the current implementation?

@Glutexo
Copy link
Collaborator

Glutexo commented Nov 30, 2020

I did. I just didn’t find the necessary cycles to do a full proper review.

One thing that bothers me and which I think needs care is that if resolve method is set to EXTERNAL, the resolve types are not considered at all. If you disable RESOLVE_FILES, but set TRANSLATE_EXTERNAL, the external files are still dereferenced by translation. I think that in this case they need to be skipped altogether.

translate and skip_references() are independent entities.
They should not belong in the same conditional statement. 🤦‍♂️
@tahmidefaz
Copy link
Contributor Author

tahmidefaz commented Dec 4, 2020

@jfinkhaeuser Anything I should improve on here? It looks pretty straight forwards to me at the moment. What do you think?

@jfinkhaeuser
Copy link
Collaborator

Thanks, guys! I'll spend some time playing around with this.

@jfinkhaeuser jfinkhaeuser changed the base branch from master to issue-77-integration December 7, 2020 08:48
@jfinkhaeuser jfinkhaeuser merged commit 5b08370 into RonnyPfannschmidt:issue-77-integration Dec 7, 2020
Copy link
Collaborator

@Glutexo Glutexo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem of this solution is that it requires RESOLVE_INTERNAL to be enabled, but itself targets only a subset of internal references. As a result, internal references in the external files are properly translated, but those in the root document are still inlined in the original way.

This behavior is inconsistent a I don’t see a use case for it. The original problem we were trying to solve was to avoid inlining by translating the references instead. But this is only partially achieved.

I think the cause of the confusion here is the “skipping” causing the reference not to be processed at all regardless of the newly introduced translation setting. To keep some of the references intact and translate the other ones, we need them not to be skipped and add one more bit of logic that differentiates between the references for translation and those for keeping.

Comment on lines +19 to +21
TRANSLATE_EXTERNAL = 0
#: Replace the reference with inlined schema.
TRANSLATE_DEFAULT = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names are strange, but I know, naming is hard. This switches the handling of external references. One value is to TRANSLATE and the other is to INLINE.

@@ -111,6 +128,8 @@ def _dereferencing_iterator(self, base_url, partial, path, recursions):
# Split the reference string into parsed URL and object path
ref_url, obj_path = _url.split_url_reference(base_url, refstring)

translate = self.__resolve_method == TRANSLATE_EXTERNAL and self.parsed_url.path != ref_url.path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The translate variable is initialized here, but used only down there in the yield condition. I know this is a remnant from the original approach, but now it is only confusing.

@jfinkhaeuser
Copy link
Collaborator

Well, forgive me for bringing #76 into this again, but I think that's a broader topic than deciding on translate vs inline.

Originally, I spoke about resolving which is probably identical to inlining in its concern, though the words have slightly different meaning. If I'm being nitpicky, resolving just looks up what's referenced, and inlining puts it in place of the reference.

What is happening here is more akin to a translation, and also more closely related to what #76 wants: instead of inlining, rewrite the reference from a remote reference of some sort to a local one. I think that is a very valuable feature in the absolutely generic sense. It may be that inlining references to objects in other local files is not necessary, but inlining references to objects at remote URLs is what is required. And what type of URLs do we want to apply this to? Should it be possible to distinguish between URLs at third-party domains, and ones on our own domain?

After following this PR (thank you!) and #76 a little, and unfortunately having too little time to do much about either, it seems to me that a good future solution should follow a different path than what is implemented either before or after this PR. Hindsight is 20/20.

a) I think we need a better mechanism for filtering what kind of references we want to treat. I'm currently going by typing them, but the example with the third-party domains above would go beyond typing. Programming languages and Python in particular have a very flexible concept here in the form of Callables; I think I would like to see a system where a user-supplied callable decides what kind of action gets triggered for a reference. We can supply a standard one that does the same thing as now.
b) We similarly need a Callable that performs the action, instead of limiting things to what's implemented now. I dabbled with this a bit in order to unify your PR with something that would solve #76, because each case is about rewriting some references to a different location. But I ran into problems with the current resolver interface.

If you agree with this general assessment, then I would very much like to discuss what we can do in a next step. There is a good reason prance is still at a 0.x version, which is that I haven't had this kind of feedback from people like yourself yet. We have the ability to make fairly deep changes before a version 1, so I would like to invite you to participate in this.

For your information, the problem I'm having with the resolver is that it's a depth-first approach, and each referenced file gets its own resolver instance. For solving #76, a inner resolver would need the ability to make changes to the specs of an outer resolver. But the specs are not there yet, because we are currently in an inner resolver precisely to build the outer specs.

I can imagine some kind of two-pass approach, and/or we could give resolvers the ability to push "patches" back to the parent, to be applied when the spec is built. I'm not sure how other parsers do this, and it sometimes doesn't help that the actual parsing isn't even part of prance - but I also don't see much value in sinking time into doing that part as well. If you have suggestions, they're more than welcome at this point.

In the meantime, I'm thinking that this PR is solving your current issue just fine. But if you have bandwidth for looking beyond that at a more generic solution to related problems, I'd be very happy to hear your feedback.

Either way, thank you very much for this!

@Glutexo
Copy link
Collaborator

Glutexo commented Jan 19, 2021

Hey, @jfinkhaeuser! Thank you for your detailed response!

This PR solves only one half of our issue and it’s only matter of luck that we didn’t get hit by the other one. If you are interested by our use case, I elaborated on it here: RedHatInsights/insights-host-inventory#771 (comment) It involves handling of recursive references, which lead us to the idea of translation to avoid inlining.

We would like to ultimately fix this for the future and even have will and some bandwidth to do it properly in your upstream. I’m going to take a closer look at the code and #76 to see what are the possible next steps.

From the first glance I am not convinced that a custom handler is necessarily bundled to a more reliable dereferencing, although it is definitely a step into the right direction. I like however the idea of two-pass processing; I am thinking about gathering all the references first and building the final document then. But that’s just a wild uneducated guess for the time being.

Do you think it may be worth it to schedule a more live meeting, I mean a call? So we can go through the issue, think out loud about the possible steps?

cc @gmcculloug

@jfinkhaeuser
Copy link
Collaborator

Absolutely! These days between two jobs (of sorts) and kids, it's hard to sink time into development for prance. But I'll very gladly take the time to hash out an approach that best serves everyone. Looks like we should both be in a EU timezone; that makes things easier. My email is in my profile, I think - send a doodle or some such?

@Glutexo
Copy link
Collaborator

Glutexo commented Jan 20, 2021

Sent a Doodle to you, @tahmidefaz, my co-worker, and @gmcculloug, our manager. Those two guys are in the EST timezone, so I scheduled for our afternoon, their morning. If this is too late for you, we can have just 1:1 in our time zone virtually any day any time in the morning / before noon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Internal references dereferenced in external files
3 participants