-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The use of TBB spawn vs TBB enqueue in MPQC version of Madness code #357
Comments
@victor-anisimov ideally we do need to have the ability to control task priority. PThread-based thread-pool implementation certainly does. The reason P.S. I'm surprised that MPQC uses |
@evaleev Thank you for quick response! It is possible that I'm using not the latest version of MPQC/TiledArray. What puzzles me is that running tasks, which have high-priority attribute on, by using function "tbb::task::enqueue(*task)", which ignores priority, gives correct numerical results in MPQC; however, using "tbb::task::enqueue(*task, tbb::priority_high)" to run tasks of high priority breaks the numerical correctness of CCSD job. The function "tbb::task::enqueue(*task)" should be interchangeable with function "tbb::task::enqueue(*task, tbb::priority_high)" but it does not appear so. Could it be that the attribute tbb::priority_high is not correctly set everywhere or function "tbb::task::enqueue(*task, tbb::priority_high)" simply fails to run a task when high-priority attribute is on? What would be the right approach to track the execution of high-priority tasks? |
Inside madness the priority is just a hint and there is no guarantee of
ordering based on priority so I share your concern over an implied race
condition or lack of queue or barrier to impose ordering.
…On Mon, Aug 17, 2020, 5:55 PM Victor Anisimov ***@***.***> wrote:
@evaleev <https://github.com/evaleev> Thank you for quick response! It is
possible that I'm using not the latest version of MPQC/TiledArray.
What puzzles me is that running tasks, which have high-priority attribute
on, by using function "tbb::task::enqueue(*task)", which ignores priority,
gives correct numerical results in MPQC; however, using
"tbb::task::enqueue(*task, tbb::priority_high)" to run tasks of high
priority breaks the numerical correctness of CCSD job. The function
"tbb::task::enqueue(*task)" should be interchangeable with function
"tbb::task::enqueue(*task, tbb::priority_high)" but it does not appear so.
Could it be that the attribute tbb::priority_high is not correctly set
everywhere or function "tbb::task::enqueue(*task, tbb::priority_high)"
simply fails to run a task when high-priority attribute is on? What would
be the right approach to track the execution of high-priority tasks?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#357 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZSAPNDSY2AMRNQFTMR67LSBGRNVANCNFSM4P75RYFQ>
.
|
@victor-anisimov High priority tasks in TiledArray are almost exclusively communication tasks. Everything else should be normal priority. There may be some changes that have been introduced since I worked on it, but that was my general approach. There may also be other components in MPQC that use high priority tasks. As for the differences in numerical precision, TA was designed to eliminate barriers in tensor contractions. As a result, the order of the tile level contractions is not guaranteed. Thus you can get some numerical noise in your output from run to run. This is an unfortunate side effect of the break in order dependencies that were required to achieve high performance. This is often disconcerting to some people as they expect the output to be 100% consistent from run to run. However, this is no different than the effect you might see from changing the order of evaluation in a deterministic system. It can also be quite useful in showing you the degree that numerical noise plays in your result. Changing the task submission so that everything goes through the shared queue is probably making the order of evaluation more stable, thus your numerical results are more stable. Having said that, it is entirely possible that there is a race condition that is causing the error in your result. Here again, submitting all tasks though the shared queue could be mask a race condition. @evaleev It is possible to introduce a race condition with TA expressions. Specifically when using the in-place updates (for example += operations). There is some measure of protection against this for a series of in place updates where the tensor receiving the in place updates is on the left-hand side of the assignment operator. However, you can introduce a race condition if the tensor receiving the in place updates appears on the right hand side of the assignment. There needs to be a barrier in that scenario. As for the deadlock problem, the most likely cause is that Future::get() is called inside a task function. Submitting tasks through the shared queue can certainly mask this kind of error. |
@victor-anisimov could you clarify what you mean when you say "MPQC start producing numerically incorrect results" with high-priority TBB tasks? please provide more details. it could be just the effects of change in task schedule, but typically we see such results to be negligibly small. |
I would defer testing until the fut-fix PR is included since that could
introduce ordering bugs.
…On Tue, Aug 18, 2020, 12:05 PM Eduard Valeyev ***@***.***> wrote:
@victor-anisimov <https://github.com/victor-anisimov> could you clarify
what you mean when you say "MPQC start producing numerically incorrect
results" with high-priority TBB tasks? please provide more details. it
could be just the effects of change in task schedule, but typically we see
such results to be negligibly small.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#357 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZSAPM7WTVS2MYMGPWQF33SBKRDNANCNFSM4P75RYFQ>
.
|
Thank you for your input @justusc @evaleev @robertjharrison! I was wrong referring to numerical errors that I thought I had after enabling task priority
What happens is that MPQC CCSD becomes very slow and the execution timeouts, but the CCSD energies are actually unaffected. I mistakenly attributed the code crash to numerical errors. It appears that "tbb::task::enqueue(*task, tbb::priority_high)" does what it is designed for but it cannot be used with the current state of TiledArray/MPQC code. Do we need anything from Intel TBB developers to improve handling of high-priority tasks? If so, supplying them with a stand-alone test to illustrate the issue would be helpful. |
While analyzing the use of TBB in MPQC version of Madness (commit c6ec0e7) code I noticed a difference between that code and the master code in thread.h file. Since MADNESS_CAN_USE_TBB_PRIORITY is undefined in MPQC build the latest (master) copy of Madness has the following piece of code that deals with tasks of different priority in thread.h
The MPQC version of Madness carries a slightly different code in that part of thread.h
This code obviously works with MPQC. Interestingly, If I replace the above 5 lines with just one line
MPQC still gives numerically correct results and even runs a bit faster.
What concerns me is the following. If I put
instead of the original piece of code (5 lines) present in MPQC version of Madness, MPQC start producing numerically incorrect results. This is tested on ALCF Theta. I checked. There is nearly equal number of tasks marked with high priority as those not carrying that attribute (hence, having regular priority). Using "tbb::task::enqueue(*task, tbb::priority_high);" appears to be the correct way of handling tasks with high priority. Why use " tbb::task::spawn(*task);" instead of "tbb::task::enqueue(*task, tbb::priority_high);" if the task carries high-priority attribute? The command "tbb::task::spawn(*task);" appears to be not much different from the regular "tbb::task::enqueue(*task);" since using only the latter single line in place of 5 lines works. Perhaps my question should be rephrased - do we have to honor high priority attribute in this part of thread.h code? Would using just one line "tbb::task::enqueue(*task);" and deleting "tbb::task::spawn(*task);" be correct in MPQC version of Madness?
The text was updated successfully, but these errors were encountered: