-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashing with Double free or corruption... #184
Comments
This looks like a bug in a C binding. Are you using any other library? |
I am. I am attempting to use strace to get more details. I am using ocaml
bindings for mysql and postgres, in a separately compiled and linked ocaml
project. Said project also uses Core and Core.Unix specifically to perform
some basic file manipulations.
I suppose I could also ditch Core.Unix for Pervasives and see if that
helps. But it could also be the bindings for the databases, although I have
been using those far longer without issue. Therefore I suspect Core.Unix,
or my use if it, is the trouble.
…On Mon, Feb 24, 2020, 10:28 Jerome Vouillon ***@***.***> wrote:
This looks like a bug in a C binding. Are you using any other library?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#184?email_source=notifications&email_token=AAXVCNXOBDA2WBSMA7HXT6LREPRRNA5CNFSM4KZI4HTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMYINVA#issuecomment-590382804>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXVCNSCAPWVHBK3JY5QZGDREPRRNANCNFSM4KZI4HTA>
.
|
It's hard to debug this kind of issue since the crash happens during garbage collection, well after the buggy code has been executed. |
Perhaps it's me.
I need to serialize requests from a web interface to do some task. I
accomplish this by using a lock file, specifically Core.Unix.flock. I also
was writing to the file some details of what was going on. Originally I
wrote this within a with_file function. That was a mistake. My first
backtrace showed the error in there, specifically on Core_Unix__with_close.
I use the db bindings for maintaining a queue and a history.
I eliminated the use of with_file after I realized I really should create a
file descriptor, use that file descriptor for locking, writing, and
unlocking. There's not much passing around of this descriptor. And I
realize how fraught with race conditions is the use of lock files. I am no
expert at it. But my use is very simple: if the file is already locked,
just push to the queue and exit, else process the queue.
So that brought me to the latest backtrace that makes no mention of
anything other than some C binding without a clue as to where. Although my
use of flock is now suspect. I also use Core.Unix.fork_exec but that
appears to be ok, or at least I have no reason to suspect it or my use of
it.
Thanks for the pointers.
…On Mon, Feb 24, 2020, 11:35 Jerome Vouillon ***@***.***> wrote:
It's hard to debug this kind of issue since the crash happens during
garbage collection, well after the buggy code has been executed.
You may be able to narrow down the issue by forcing garbage collections by
inserting calls to Gc.minor () in your code.
Core.Unix and Lwt are rather solid. I would not expect a bug there.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#184?email_source=notifications&email_token=AAXVCNR5AUOBP2ZEY5OVQKLREPZN5A5CNFSM4KZI4HTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMYSCCA#issuecomment-590422280>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXVCNTYSJXKZBJEYO6OWKLREPZN5ANCNFSM4KZI4HTA>
.
|
It's been over a year but I wanted to leave an update: I discovered that sprinkling GC.full_major () liberally throughout just one of my dependencies ( a module that utilizes Postgresql ) appears to have fixed this issue. I added the garbage collection calls almost only within functions where I actually utilize Postgresql, such as new Postgresql.connection, #exec, #status, etc. I cannot imagine I am the only person who sees this. I must be abusing the Postgresql package somehow but I don't see how. This workaround is ugly and merely discovered through guessing. I have no idea where the problem lies or what it is, but instead I know I can ward it off with lots of garbage collection. I may have some time in the future to narrow it down, removing some of those collections, and possibly use only minor collection, to see if anything makes a difference and where. |
Below is a backtrace. I can also provide the memory map if that would be of any help. I have an instance of ocsigenserver running a small eliom project and it consistently crashes after a short while constantly. This is very disappointing and I have never seen anything like this. This is running on ocaml 4.06.1, eliom 6.7.0, js_of_ocaml 3.4.0, lwt 4.2.1, and ocsigenserver 2.14.0.
Could it be that I have caused this failing behavior through use of lwt? Or somehow? Or is this a bug in ocsigenserver?
*** Error in `/home/admin/.opam/4.06.1/bin/ocamlrun': double free or corruption (fasttop): 0x00007fdf78009860 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7fdfa4730bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7fdfa4736fc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7fdfa473780e]
/home/admin/.opam/4.06.1/bin/ocamlrun(caml_empty_minor_heap+0x126)[0x5605e5373a56]
/home/admin/.opam/4.06.1/bin/ocamlrun(caml_gc_dispatch+0x4b)[0x5605e5373eab]
/home/admin/.opam/4.06.1/bin/ocamlrun(caml_interprete+0x22a4)[0x5605e538dd34]
/home/admin/.opam/4.06.1/bin/ocamlrun(caml_callbackN_exn+0xb9)[0x5605e5385bc9]
/home/admin/.opam/4.06.1/bin/ocamlrun(caml_callback_exn+0x18)[0x5605e5385c28]
/home/admin/.opam/4.06.1/lib/ocaml/stublibs/dllthreads.so(+0x2b09)[0x7fdfa3a42b09]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7fdfa4a664a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fdfa47a8d0f]
The text was updated successfully, but these errors were encountered: