Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance HTTP lookup join to support N:M relationships #118

Open
MarekMaj opened this issue Aug 20, 2024 · 2 comments
Open

Enhance HTTP lookup join to support N:M relationships #118

MarekMaj opened this issue Aug 20, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@MarekMaj
Copy link
Contributor

The current flink-http-connector lookup join implementation supports a N:1 relationship by returning a maximum of one value for a given lookup key. However, in some scenarios, it may be necessary to support an N:M relationship.

The LookupFunction interface for the lookup join allows returning multiple values for a given lookup key. When multiple values are retrieved from the right-hand side of the join, the lookup join produces one event for each corresponding value. This feature has already been implemented as a reference in the JDBC Connector.

To implement this feature in the HTTP connector, the following considerations must be addressed:

  1. The current assumption that only one event is returned in the response body should be revisited. The connector must support returning a collection of values.
  2. The response format should support pagination. A HATEOAS-compatible interface can be assumed, utilizing links included in the response to navigate through the list of pages. Furthermore, we could consider support for providing API doc
  3. Since the new format is incompatible with the existing one, a configuration flag gid.connector.http.source.lookup.response.unwrap-multiple-values, defaulting to false, should be added for backward compatibility. This default can be changed in the future.
@MarekMaj MarekMaj added the enhancement New feature or request label Aug 20, 2024
@davidradl
Copy link
Contributor

@MarekMaj
An interesting idea. Is this something you are interested in implementing?

It seems to me that it is implicit that we issue a rest call to get one item. Your suggestion implies there is more of a search API. It would be great to see a use case and example of what this would look like in SQL and also how this would map to config, lookup keys, the rest call request and how the response would be mapped to multiple items. I assume the idea is that an array of objects would be returned.

  1. Does your use case require pagination? I don't think Flink supports pagination on joins - so I wonder why we would need them here. HATEOAS-compatible interfaces are conceptually nice, but are very chatty because of the references. I am not sure how widely used they are these days. I had assumed the idea would be call existing search Rest APIs, which probably are not HATEOAS-compatible.

@MarekMaj
Copy link
Contributor Author

MarekMaj commented Sep 2, 2024

@davidradl
Thank you for the comment!

I assume the idea is that an array of objects would be returned.

Exactly, that’s the idea. The lookup keys won’t be affected. However, we do need to slightly adapt the configuration. Right now, implementation assumes that the entire response body gets transformed into RowData. But with the configuration flag I mentioned earlier, we should be able to change that assumption and instead expect a list of values for the specified table. That change should be backward compatible. The schema for a single RowData entry won’t change, and neither will the SQL. There’s no change in how the request maps to the list of response entries since each result will still be joined with the input event.

I don't think Flink supports pagination on joins

This depends on the connector underlying implementation, that will not affect high level flink api.

HATEOAS-compatible interfaces are conceptually nice, but are very chatty

That's a good point. For this simple API, I don't think it's necessary to implement that. The main requirement I'm emphasizing is the need to introduce pagination, which is essential for a well-designed REST API that returns a list of objects. We can discuss how to implement this in a way that’s both simple and efficient.

Currently, our interface simplifies Flink lookup table api. In a nutshell, with this change we could extend its capabilities, adapting fully to the interface in LookupFunction where multiple values could be returned in lookup:

public abstract Collection<RowData> lookup(RowData keyRow) throws IOException;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants