Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflows with certain patterns containing nested repetitions. #11

Open
nbtrap opened this issue Jan 3, 2014 · 5 comments
Open

Comments

@nbtrap
Copy link
Contributor

nbtrap commented Jan 3, 2014

Tests 636 and 638 from test/perltestdata fail with stack overflows. Here are the relevant calls:

(ppcre:scan "((a{0,5}){0,5}){0,5}[c]" "aaaaaaaaaa")
(ppcre:scan "((a{0,5}){0,5})*[c]" "aaaaaaaaaa")
@hanshuebner
Copy link
Member

The problem is with the [c], which causes the stack overflow. If it is replaced by just c, the problem does not occur. At this point, there is no fix for this issue.

@nbtrap
Copy link
Contributor Author

nbtrap commented Jan 5, 2014

I don't think that's right. The reason it works with 'c' instead of '[c]' is that the optimizer causes the scanner to fail ab initio if it sees that the string being matched doesn't contain a 'c'. We could (and should) make the optimizer simplify single-character classes, but that wouldn't really solve the problem here.

The problem seems to lie with the nested repetitions, but I haven't been able to pin it down.

@hanshuebner
Copy link
Member

I think a fix would be welcome, but we don't have one right now.

-Hans

2014/1/5 nbtrap [email protected]

I don't think that's right. The reason it works with 'c' instead of '[c]'
is that the optimizer causes the scanner to fail ab initio if it sees that
the string being matched doesn't contain a 'c'. We could (and should) make
the optimizer simplify single-character classes, but that wouldn't really
solve the problem here.

The problem seems to lie with the nested repetitions, but I haven't been
able to pin it down.


Reply to this email directly or view it on GitHubhttps://github.com//issues/11#issuecomment-31615143
.

@nhabedi
Copy link
Member

nhabedi commented Jan 5, 2014

The optimizer already simplified single-character classes up to version 1.4.1. In my opinion that was a pretty useless micro-optimization and thus I removed it for 2.0.0. Even if you had that back, you'd run into the same problem if you replaced '[c]' with, say, '[cd]'.

As for pinning down the problem with nested repetitions - good luck. You can kill every regex engine with a suitable regex if you want - no matter how optimized it is... :)

There are simply some problems for which regexes aren't the right tool. And then there are stupid regexes a regex engine shouldn't be optimized for. IMHO, YMMV, etc.

@nbtrap
Copy link
Contributor Author

nbtrap commented Jan 5, 2014

nhabedi [email protected] writes:

Even if you had that back, you'd run into the same problem if you
replaced '[c]' with, say, '[cd]'.

Yes, that's what I was trying to say.

As for pinning down the problem with nested repetitions - good luck.
You can kill every regex engine with a suitable regex if you want - no
matter how optimized it is...

But in this case, it's failing because of a bug in the backtracking
logic, not because the pattern used just happens to require more stack
than is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants