Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unix-ffi: re: convert to PCRE2 #737

Merged
merged 1 commit into from
Oct 31, 2023
Merged

Conversation

Ansuel
Copy link
Contributor

@Ansuel Ansuel commented Sep 28, 2023

PCRE is marked as EOL and won't receive any new security update.

Convert the re module to PCRE2 API to enforce security. Additional dependency is now needed with uctypes due to changes in how PCRE2 return the match_data in a pointer and require special handling.

The converted module is tested with the test_re.py with no regression.

@Ansuel
Copy link
Contributor Author

Ansuel commented Sep 28, 2023

(this is needed as we are deprecating pcre in openwrt and we are converting any user to pcre2)

@dpgeorge
Copy link
Member

dpgeorge commented Oct 4, 2023

Thanks for the contribution.

What's the difference between libpcre2-posix, libpcre2-8, libpcre2-16 and libpcre2-32?

It would be possible to support both the old PCRE and PCRE2, but I don't think it's worth the effort.

@Ansuel
Copy link
Contributor Author

Ansuel commented Oct 4, 2023

The POSIX variant just provide wrapper to have POSIX interface for pcre2 API.
Pcre had the same variant and wasn't used here.

The -16 and -32 variant are used to support 16bit and 32bit string (UTF-16 and UTF-32)
UTF-8 should be enough and what is commonly used when migrating from pcre to pcre2

Supporting both versions would bloat the code with lots of ifdef and IMHO it's not worth it and pcre won't receive any update and would pose security risks.

@dpgeorge
Copy link
Member

dpgeorge commented Oct 4, 2023

The -16 and -32 variant are used to support 16bit and 32bit string (UTF-16 and UTF-32)
UTF-8 should be enough and what is commonly used when migrating from pcre to pcre2

Aah, I see, thanks for the info. Indeed we should use the -8 variant.

I see that the test_re.py provided in this re directory doesn't pass anymore. It needs to pass.

@Ansuel
Copy link
Contributor Author

Ansuel commented Oct 4, 2023

@dpgeorge i remember it did pass correctly. Maybe i'm running it wrong? my test was compiling the lib and run the test using the compiled library. Something wrong in this?

@dpgeorge
Copy link
Member

dpgeorge commented Oct 5, 2023

I'm simply running the following in the unix-ffi/re directory with this PR checked out:

$ micropython
MicroPython v1.20.0-557-g5aec051f9 on 2023-10-05; linux [GCC 13.2.1] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> import re
>>> re.search(r"a+", "caaab").group(0)
''

That's not correct output.

@Ansuel
Copy link
Contributor Author

Ansuel commented Oct 5, 2023

@dpgeorge mhh i'm confused

root@OpenWrt:~# micropython
MicroPython v1.20.0 on 2023-10-04; linux [GCC 12.3.0] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> import re
>>> re.search(r"a+", "caaab").group(0)
'aaa'

@Ansuel
Copy link
Contributor Author

Ansuel commented Oct 5, 2023

Also by putting the uncompiled re.py in /usr/lib/micropython/unix produce the same output

@Ansuel
Copy link
Contributor Author

Ansuel commented Oct 9, 2023

@dpgeorge I finally manage to repro the problem... It seems to be a problem with PCRE2_SIZE that is size_t and this change between 32bit and 64bit. Using ULONG handled both systems. (my test were done on a 32bit system)

@Ansuel Ansuel force-pushed the pcre2-conversion branch 2 times, most recently from bbfc3f8 to c35e738 Compare October 9, 2023 11:22
# pcre2_get_ovector_pointer return PCRE2_SIZE that is of type
# size_t. Use ULONG as type to support both 32bit and 64bit.
ov_buf = uctypes.bytearray_at(
ov_ptr, uctypes.sizeof({"field": 0 | uctypes.ULONG}) * (cap_count + 1) * 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this call to sizeof to get the size be done at global module level, so it only has to be done once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better now?

@dpgeorge
Copy link
Member

Thanks, I can confirm that it now works on my (64-bit) machine.

@Ansuel Ansuel force-pushed the pcre2-conversion branch 3 times, most recently from 54e92a9 to 1cbe8c4 Compare October 12, 2023 12:10
@BKPepe
Copy link

BKPepe commented Oct 16, 2023

So, @dpgeorge it can be merged, right? :)

Copy link
Contributor

@projectgus projectgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, thanks @Ansuel!

PCRE is marked as EOL and won't receive any new security update.

Convert the re module to PCRE2 API to enforce security.  Additional
dependency is now needed with uctypes due to changes in how PCRE2 return
the match_data in a pointer and require special handling.

The converted module is tested with the test_re.py with no regression.

Signed-off-by: Christian Marangi <[email protected]>
@dpgeorge dpgeorge merged commit d8e163b into micropython:master Oct 31, 2023
3 checks passed
@dpgeorge
Copy link
Member

Thanks for updating, this looks good now.

Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants