Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBL-5170: Support Inner Unnest Query in SQL++ #2133

Merged
merged 2 commits into from
Sep 16, 2024
Merged

Conversation

jianminzhao
Copy link
Contributor

Introduced a mew keyword, UNNEST.

Also, moved definition of reserved words from n1ql_parser_internal.h to n1ql.leg, to keep the list from out-of-sync from the LEG grammar, because the definitions in leg file will be compiled by LEG. We found, in n1ql_parser_internal.h, that MISSING is duplicaed, NATURAL and USING are not used, and EXISTS and SOME are missing in the list of kReservedWords

Introduced a mew keyword, UNNEST.

Also, moved definition of reserved words from n1ql_parser_internal.h to n1ql.leg, to keep the list from out-of-sync from the LEG grammar, because the definitions in leg file will be compiled by LEG. We found, in n1ql_parser_internal.h, that MISSING is duplicaed, NATURAL and USING are not used, and EXISTS and SOME are missing in the list of kReservedWords
@cbl-bot
Copy link

cbl-bot commented Sep 9, 2024

Code Coverage Results:

Type Percentage
branches 67.68
functions 79.07
instantiations 35.12
lines 79.03
regions 74.91

VALUED = "VALUED"i WB
WHEN = "WHEN"i WB
WHERE = "WHERE"i WB

reservedWord = ALL | AND | ANY | AS | ASC | BETWEEN | BY | CASE | COLLATE | CROSS | DESC | DISTINCT | ELSE | END | EVERY | EXISTS | FALSE | FROM | GROUP | HAVING | IN | INNER | IS | JOIN | LEFT | LIKE | LIMIT | MISSING | NOT | NULL | OFFSET | ON | OR | ORDER | OUTER | SATISFIES | SELECT | SOME | THEN | TRUE | UNNEST | VALUED | WHEN | WHERE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause any performance impact to the parser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be performant because, I think, LEG compiler will create a regex state machine. In any case, the performance impact will be minuscule, unless dealing with large n1ql file with a lot of identifiers.

Copy link
Member

@borrrden borrrden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to need to discuss how to handle this because it will break a potential query like the following:

SELECT * FROM unnest where unnest is the name of the collection

Both the act of making it a reserved word, and using it as a keyword is a change that is probably not suitable for a patch release. That being said, we should add in any words that we think may be used in the future to reserved words ASAP the way that server does (in fact we might just copy their list)

@jianminzhao
Copy link
Contributor Author

If this is a constraint, I may be able to get away without having it a reserved word. That means, work harder making grammar more stringent so that unnest means unnest only in second appearance of "unnest" of "from x unnest unnest". but this is not very appealing.

Copy link
Collaborator

@pasin pasin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the discussion that we will make unnest a reserved keyword and the input from @jianminzhao that the change to move the reserved keyword check to the parser itself should have minior performance impact, I'm approving this PR.

…commit, except for the additional new reserved word UNNEST.
@jianminzhao
Copy link
Contributor Author

I reviewed the handling of reserved words in this PR, motivated by the sensitivity of new reserved words to API compatibility.
I tried to ensure that, with the refactoring, I don't change the set of reserved words except for the new one, UNNEST. During the effort, I found 3 words that the grammar apparently takes as keywords but not in the list of reservedWords in n1ql_parser_internal.hh, which I removed in this PR.
The upshot is, ALL, EXISTS, and SOME, happens to be parsed in intended grammatical positions. Because they are not keywords, I can have
SELECT some.exists AS all FROM _ AS some
parsed successfully. If they were reserved words, the above query won't parse.
My intent is to keep the above behavior.

@borrrden
Copy link
Member

borrrden commented Sep 13, 2024

That reasoning is correct. I believe in 4.0 we should make all of them reserved to get more parity. Though I thought we had agreed that UNNEST would behave as these three exceptions do now in order to minimize the potential breakage, and then make it an actual reserved word in 4.0.

@jianminzhao jianminzhao merged commit 5184267 into release/3.2 Sep 16, 2024
9 checks passed
@jianminzhao jianminzhao deleted the cbl-5170 branch September 16, 2024 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants