Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lambdas in json_merge function #17284

Open
abhishekagarwal87 opened this issue Oct 8, 2024 · 7 comments · May be fixed by #17339
Open

Support lambdas in json_merge function #17284

abhishekagarwal87 opened this issue Oct 8, 2024 · 7 comments · May be fixed by #17339

Comments

@abhishekagarwal87
Copy link
Contributor

abhishekagarwal87 commented Oct 8, 2024

Description

The json_merge function would retain the rightmost key when the leaf elements have the same key. However, in some cases, you may want to aggregate the two results e.g. adding, multiplying, etc. If we can pass a lambda as an argument inside json_merge, we need not write custom json functions for doing such aggregations.

@shigarg1
Copy link
Contributor

shigarg1 commented Oct 8, 2024

Hi Abhishek
I would like to work on this. But I have few concerns

  1. It is possible the data type are different for leaf element with same key. ex - one having a string, while other having an integer
  2. It should only aggregate in case of numbers.
  3. Do we perform aggregation for nested data as well ?
  4. It might be better to have a different function name rather than adding this functionality to same function to keep json_merge consistent with overall database community.

Any advice on getting started on this will be helpful. Thanks

@abhishekagarwal87
Copy link
Contributor Author

Thanks Shivam.

  1. It is possible and we should handle the mixed data types.
  2. depends on the kind of aggregation.
  3. can you elaborate?
  4. Sure. Maybe it could be an aggregator with a delegate aggregator that operates on leaf elements.

@clintropolis @kgyrtkirk In case you have any suggestions.

@shigarg1
Copy link
Contributor

shigarg1 commented Oct 9, 2024

Elaboration on third point
Merging these two JSON - '{"a":{"x":1}}', '{"a":{"x":2}}' We should handle adding 1 & 2 ?

@abhishekagarwal87
Copy link
Contributor Author

Yes. We should. 1 and 2 are leaf elements and the lambda would specify how those two leaf elements get merged/aggregated.

@shigarg1
Copy link
Contributor

shigarg1 commented Oct 9, 2024

I am planning to create a function like
JSON_MERGE_AGGR(<AGGREGATOR>, JSON1, JSON2, ....) aggregator can be ADD, MULTIPLY, APPEND (Can add more once the framework is ready)

Let me know if you have any suggestions.

@shigarg1
Copy link
Contributor

shigarg1 commented Oct 9, 2024

Hi @abhishekagarwal87
Facing issue with array as inputs
SELECT JSON_MERGE_AGGR('ADD', '{"c": [2,3]}', '{"c": [4,5]}')

what should be result for this ?
There are 3 options

  1. '{"c": [4,5]}'. // take the rightmost value
  2. '{"c": [2,3,4,5]}' // this is currently being done by json_merge
  3. '{"c": [6,8]}'

If we go by 3, then if array size is different do we copy the extra values?
ex - ADD([2,3,7] , [2,3]) -> [4,6,7]

Also want to check if json_merge behavior is expected for array values ? In docs it mentions - "Preserves the rightmost value when there are key overlaps". In that case it should choose option1 and not 2.

@shigarg1 shigarg1 linked a pull request Oct 14, 2024 that will close this issue
10 tasks
@LakshSingla
Copy link
Contributor

Thanks for raising the PR.

If I am interpreting @abhishekagarwal87's proposal correctly, the merge aggregator shouldn't be limited to the ADD, MULT... operations. Users should be able to take in a lambda function and apply the operations according to that. This would ideally involve implementing it via a higher-order function - a function taking another function as an argument.

You can look at https://druid.apache.org/docs/latest/querying/math-expr/#apply-functions for existing examples of such methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants