-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: regexp_extract returns match in mismatched group #12109
base: main
Are you sure you want to change the base?
fix: regexp_extract returns match in mismatched group #12109
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@kgpai @mbasmanova Could you kindly help review this PR? Thanks a lot. Any suggestion is welcome. |
Sounds like both the implementation is buggy and test expectations are wrong. In this case we need to fix both the implementation and the test. Would you check Presto Java to see if Velox behavior matches it? |
Looks like Presto Java returns NULL:
@HolyLow Would you create a GitHub issue to describe this problem? Then, reference the issue in the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix!
@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
The implementation of Re2Extract has a bug, that it might consider a mismatched group as MATCHED empty string "" rather than MISMATCHED std::nullopt.
For example, in the function calling: regexp_extract("rat cat\nbat dog", "ra(.)|blah(.)(.)", 2).
In this case, for group 2 the result must be std::nullopt because no substring would match pattern
blah(.)
.But the current implementation would mistake the matching of group 1
ra(.)
as a empty match case for group 2, and thus return a empty matching, which is wrong.This PR fix this bug in Re2Extract implementation.
Also note that this bug behavior exists in Re2ExtractAll as well, but this PR doesn't modify in Re2ExtractAll because existing UTs of Re2ExtractAll already rely on this behavior.