-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock if fewer threads (<args.nthrds) started #28
Comments
Hi, does the issue still happen with the "-s" flag which disables the SCHED_FIFO setting? |
Hi Manish, which workload did you run for this deadlock case? Thanks |
I am running ./runall.sh. |
|
When running with -s what sort of effective parallelism do you see? It should be close to the number of requested cores. If it's significantly lower then the improvement may be due to -s mode not being able to recreate the high contention case and not directly a problem with FIFO mode itself. |
I havent seen with -s, anytime number of thread created < nthrds. |
The number of threads created will be the same but "effective parallelism" (output by the tool) tells you how many of the actual threads are running at the same time. So you could have 200 cores and 200 threads but if each one runs to completion on one core before the next core starts you can theoretically have effective parallelism of only 1 thread even though thread creation equals requested threads. |
AFAIK,
|
It's more likely that the child threads get starved but the scheduler should be waking up cores to steal and run the child threads since there will be balance problems otherwise (one core with two runnable FIFO processes and one core with nothing). One thing I've been thinking about trying is spawning a bunch of threads to make the balance issue look worse and cause the scheduler to step in sooner and then affine threads to whichever unloaded core they end up on first (or exit if the core they end up already has a waiting lockhammer process). Anyway, that's not relevant to the question I'm asking which is "does safemode successfully achieve the requested contention level." I'm guessing not since FIFO mode was added in to avoid this exact problem in the first place which is why I'm asking. In other words safemode might "solve" the issue you're seeing but it probably does it by making the test a useless measure of performance in the high core count contention case (because it likely fails to achieve it). How does the "effective parallelism" metric compare to requested thread counts for high thread counts where you were previously seeing the scheduling issue? Edit: slight change, main thread should be free to run anywhere, not just hw thread 0 (if that's not case it's a bug). |
I created a test branch which sched_yields the thread on core 0 if all child threads are not ready yet. Unfortunately I cannot replicate this issue on systems to which I have access so please try this branch and see if it helps: https://github.com/codeauroraforum/synchronization-benchmarks/tree/lh-yieldwait |
Tried this, and replaced below as well I think i missed one point, affinity of main thread is all cores, so wherever it is rescheduled and there is a contention not all threads will start. So I believe we need to put sched_yield in all atomic functions. |
If we yield the other threads then we need to add in another sync step without a yield to make sure everyone is actually both started and running. Eg, current scheme is:
If we yield the startup threads it should be:
That said I still think this is more of a scheduler balance problem where at high core counts a single core with an extra runnable but not running process (ie, the main thread) doesn't look like too bad of a balance problem so sleeping hardware threads are not woken up to execute the main software thread for a long time in the hopes that one of the many low utilization hardware threads already running can take care of it in a short amount of time (but of course they can't because they're all running FIFO threads that are busy spinning). |
This is similar to earlier issue I posted sometime back.
After a run of about 20 minutes a deadlock is observed when not all of the 'n' threads (args.nthrds) could be started by main(). All cores on which threads are started are at 100%. The first child thread is waiting for ready_lock, while others are waiting for sync_lock.
This behaviour is observed when number of cores (threaded per core threads 4) is 200+.
Not sure why all nthrds not starting, could be RT throttling issue.
Comments suggestions...?
The text was updated successfully, but these errors were encountered: