Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest release can cause segmentation faults #74

Closed
danepowell opened this issue Mar 22, 2018 · 9 comments
Closed

Latest release can cause segmentation faults #74

danepowell opened this issue Mar 22, 2018 · 9 comments

Comments

@danepowell
Copy link
Contributor

A project I work on recently upgraded its dependency on caxy/php-htmldiff from 0.1.6 to 0.1.7. After the update, we are seeing segmentation faults when calling php-htmldiff. The fault actually occurs in the c standard library, not in the php binary.

The only change in the 0.1.7 release is this PR: #72

This might be a little hard to reproduce, because I'm actually not sure how php-htmldiff is being invoked. It's used by a dependency of a dependency of a dependency of a dependency on our project 😄

But I'd still appreciate any input you might have on what could be causing segfaults with this change. Most likely a problem with the mbstring extension I'd imagine.

@jschroed91
Copy link
Member

Hmm... very interesting @danepowell.

Is it happening every time? Are you able to share examples of the HTML inputs that are going into htmldiff? Or is it happening for any input?

@SavageTiger
Copy link
Contributor

SavageTiger commented Mar 23, 2018

Also it might be good to know what version of PHP this is happening on.

@danepowell
Copy link
Contributor Author

It happens every time I run a certain command. Like I mentioned, it's hard to know what input is being passed to the library since I'm not calling it directly, but I'll try to find out.

I've reproduced it on PHP 7.1 and 7.2. Haven't tried other versions.

My hunch is that this is a out of memory error. I think I've seen this happen before with excessively long strings being passed to mbstring. But i really can't know for sure.

@danepowell
Copy link
Contributor Author

danepowell commented Mar 26, 2018

I've narrowed it down quite a bit.

This is the line generating the segfault:

return preg_match("/^[a-zA-Z0-9\pL]+$/u", $str);

The latest release works fine if I simply change that line back to mimic the old version:
return ctype_alnum($str);

I'm still having trouble figuring out if or where that line is being called. Even if I put print('hello world'); or exit; statements in there, nothing seems to happen. It's like the line isn't being called at all, and yet it's generating a segfault. It's also possible the output is getting swallowed by SSH/TTY or something.

Edit: now it's segfaulting even if I have the old line in there, so it's possible I'm chasing a red herring here. I'll see what else I can find. Sorry for the noise...

@SavageTiger
Copy link
Contributor

Maybe this is relevant: https://bugs.php.net/bug.php?id=65009&edit=1

If this is the case, best solution is to reimplement this method in another hopefully also unicode compatible way.

@danepowell
Copy link
Contributor Author

I've refined my theory about what's happening here: I think this is actually a bug in xhprof that's being triggered by php-htmldiff. I'm in pretty well over my head on this though; can you review the evidence and let me know if you agree, and then I can try to open an upstream ticket with xhprof?

Here is a core dump and backtrace that I managed to pull: https://pastebin.com/4iVHND1G

The germane / proximal lines:

#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:38
38 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb) bt
#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:38
#1 0x00007f04b302b743 in memcpy (__len=39, __src=, __dest=0x7f04b0ae8358) at /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2 zend_string_init (persistent=0, len=, str=) at /usr/include/php/20160303/Zend/zend_string.h:160
#3 hp_compile_file (file_handle=0x7ffc213c11a0, type=8) at /beetbox/workspace/7.1/xhprof-php7/extension/xhprof.c:1594

It appears that xhprof performs some sort of compilation or hooks into the opcode compilation. My theory is that xhprof is causing this segfault when it tries to read/compile that preg_match line in php-htmldiff. That would explain why php-htmldiff is involved despite apparently never being called.

To verify this, I tried repeatedly enabling/disabling the xhprof extension. Whenever xhprof is enabled, PHP segfaults. Whenever xhprof is disabled, no segfaults. So xhprof definitely seems involved (unless this is a memory problem or something more exotic).

@SavageTiger
Copy link
Contributor

As far as I can tell the segterm happens while memory is allocated, and I imagon you came to the same conclusion. Not sure if this gdb dump has any value to us or the xhprof devs.

Best thing might be to try and create a standalone php file with a huge string in it and apply the same preg_match to it. If you get a segfault, you can supply that file as a reproducible situation to the xhprof maintainers.

@danepowell
Copy link
Contributor Author

danepowell commented Mar 28, 2018

Whatever this is, it seems to not be a problem with this library so I can close this issue and declutter your queue. If you have any other insight I'd love to hear it though.

I'm 99% sure php-htmldiff is not even being called when the segfault occurs so I doubt this has anything to do with input to preg_match. I think this library just happened to trigger some sort of memory bug elsewhere in the stack. I opened an xhprof issue just in case that's the culprit: phacility/xhprof#102

@danepowell
Copy link
Contributor Author

Just closing the loop on this... I still have no idea what caused the segfaults, but a colleague started experiencing them even with the older version of caxy/php-htmldiff, so I'm pretty sure this library is completely innocent :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants