-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do a trace and pprof run on Gaia #205
Comments
Actually, I know there are more immediate issues out there which need attention, but man I ran into a race condition with the tests again today. So I'm a bit afraid that we should optimise the routines a bit. |
Yeah, sounds like a good idea. Especially the scheduler is critical and could potentially have some leaking routines. |
Aye. I agree. I'll get into it, once the worker rebuild is finished. I'm really eager to trace things. :) |
Alright, found my first data race already. Going to work to fix this one first as it's the easiest to find and reproduce.
|
@michelvocks At the top of your head, do you know the reason for implementing Plugin with pointers? I saw no reason in particular. Plugins aren't shared as far as I can see. The Flush operation is causing a race condition through the scheduler. Apparently more than one routine at the same time can't flush. This is the most obvious race in Gaia. Using values would fix it since plugins wouldn't be shared through out gaia. I don't know yet what that would mean in terms of memory usage though. I haven't tested it yet. I plan on doing so. :) |
@Skarlso Yes. One plugin instance is always one plugin run. We need to have an instance since we need to run multiple jobs synchronously via this instance. If we don't have a plugin instance, we would have to start the plugin for every pipeline job which is very insufficient. Additionally, a user potentially wants to share data/information between jobs within a pipeline run. If we create a new plugin instance for every job, you couldn't share in-memory data between jobs and you are literally forced to write information to disk or to a remote cache/database which is more work. |
That could be achieved by giving back a modified version of the struct, instead of passing it around as pointers, not? |
I don't know how but maybe we are talking past each other. Could you elaborate a bit more on the problem/solution? In my opinion, we could easily add a mutex to the Plugin struct and add a lock to the |
Adding a mutex is a patch. :) That's like using a buffered channel instead of a channel because that appears to be solving a problem.
I'm trying to eliminate rather the possibility of race, instead of putting a mutex on it.
Of course a mutex would be easy. But that's like a leaking boat, and putting a patch on the leak, instead of trying to fix the boat, I think.
On 2019. Sep 9., at 16:11, Michel Vocks <[email protected]> wrote:
I don't know how but maybe we are talking past each other. Could you elaborate a bit more on the problem/solution?
In my opinion, we could easily add a mutex to the Plugin struct and add a lock to the FlushLogs method?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#205?email_source=notifications&email_token=AABMUQSE3WYESUENXIHMYETQIZKRZA5CNFSM4IPKCT42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6HW7ZY#issuecomment-529493991>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABMUQUTQ645QQJFPWXAPZTQIZKRZANCNFSM4IPKCT4Q>.
|
That said, it can occur that a mutex IS the solution. But I would like to make sure, I'm not just patching a sinking boat. :) |
I'm gonna show you in a draft pr what I meant and like'd your opinion on it. :) |
Ah, yes. It's way too much work and ends up being ugly. Mutex it is I guess. :) |
Hah no, I succeeded, but get this. Because of other things, like Job and such, I still got a data race at flush! But this is a good thing. Because now I verified that indeed a mutex is a good approach at this point. :) Awesome. |
Get this! :) Just so I could ease my mind, I got further and re-wrote a large chunk of passing around the plugin to values instead of pointers. And when there is a mutation, I pass the mutated value back instead of the pointer semantics. I got rid of the race. :) However, I don't like the resulting code that much. It's not intuitive enough and sometime it passes it back sometimes it doesn't. So I'm not going to go with that. Not to mention that I don't have data about the memory allocation increase because of the copy. Aw well. 🤷♂ |
Huh, so guess what... The mutex didn't solve the data race. |
Oh I know why. |
Well this is problematic. Gaia is racing with go-plugin's logStderr. |
So the flush happens so that the logs are displayed in the log window, if I see this correctly. But the plugin is also trying to write / read to the buffer we are providing it. This is problematic, because even if we lock on our end the plugin will not lock of course, because it's using a simple Write. Which means a data race on the buffer. |
Okay, so the solution must be that we implement our own writer using an underlying buffer and io writer. On which the Write is basically a bufio.Write but with a lock in it. And we then pass that to go-plugin. |
No description provided.
The text was updated successfully, but these errors were encountered: