Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A core dump error happened in gearman_worker_free at the latest 1.1.20 gearmand #368

Open
mrbejka opened this issue May 22, 2023 · 6 comments

Comments

@mrbejka
Copy link

mrbejka commented May 22, 2023

Hi,

    In my use, I find a segment error which is the same as #155. And I try to upgrade to 1.1.20, this will not happen agian but we get a new segment error, the core dump stack can be seems as below:
gearman error debug gearmand
    After I add reset_recv_packet , my program will not report segfault with gearman .  
resolve gearmand
   I don't know if my modification is a robust way to resolve this problem as I know very little about the actual mechanism of gearman. This seems that _recv_packet is ofen set to the address of Job::asssigned, and connection.cc should have no permission to free __recv_packet as this __recv_packet is always borrowed from Job in my option. 
@esabol
Copy link
Member

esabol commented May 28, 2023

Screenshots are not the best way to report issues or suggest changes in my opinion.

To clarify, you added reset_recv_packet(); after line 1037 here:

size_t recv_size= recv_socket(recv_buffer +recv_buffer_size, GEARMAN_RECV_BUFFER_SIZE -recv_buffer_size, ret);
if (gearman_failed(ret))
{
if (ret != GEARMAN_IO_WAIT) {
recv_state= GEARMAN_CON_RECV_UNIVERSAL_NONE;
}
return NULL;
}

I can't see any harm in doing so, but I'm not sure why it would be needed.

If this does solve a problem, I wonder if it might also be necessary before line 1080:

if (gearman_failed(ret))
{
return NULL;
}
}

Any thoughts on this, @SpamapS ?

@mrbejka
Copy link
Author

mrbejka commented May 29, 2023

Sorry for report a issue in a unpleasant way, this is my first issue and first contact with github developers.

This is my flow in my program, it initialize a gearman_worker_st called worker at first, and it would finalize and free this worker at the end. Between initialize and finalize, my program would call gearman_worker_work with timeout seted and check the ret of gearman_worker_free, when gearman_worker_work return GEARMAN_NOT_CONNECTED or GEARMAN_COULD_NOT_CONNECT, it would try to call gearman_worker_free to finalize the older gearman_worker_st and initialize a new gearman_worker_st, and the program would print some log before calling gearman_worker_free.

After I upgrade gearman to 1.1.20, I would not get segfault at gearman_worker_work,but I would get new segfault at gearman_worker_free. I don't know why gearman_worker_free was called because I didn't find any logs about the return, but my program does call gearman_worker_free and causes segfault. And the coredump info can be seen as the preivous screenshots 1-2.

As the screenshot shows, the _recv_packet is not a regular gearman_packet_st packet, so I thought this may be caused by this __recv_packet is borrowed from the foreign object and reset_recv_packet is not called after recv_state is set to GEARMAN_CON_RECV_UNIVERSAL_NONE.

Do these words help you understand what's happening better now?

@esabol
Copy link
Member

esabol commented May 29, 2023

Yes, that was very helpful. Thank you for the additional information, @mrbejka.

@SpamapS
Copy link
Member

SpamapS commented May 30, 2023 via email

@esabol
Copy link
Member

esabol commented Jun 16, 2023

@mrbejka , can you provide us with worker code which reproduces the core dump? It would really help!

@mrbejka
Copy link
Author

mrbejka commented Jun 25, 2023

Hi,easol and SpampS

I am pleased to offer you a case,but I'm sorry that I can't not offer the origin case to you. But I would try to design a new case to help reproduce it. This need times as I'm busy.

This happen at CentOS7 , and blocking api of gearman are used only in my case. In my option,the key points that lead to this phenomenon are the large number of worker threads, the occasional big data transfer and the blocking api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants