Tensorflow deadlock in ARM after exception in job #1077
Labels
Phase-1
Pertains to phase-1 development
Software Error
Bug causing segfaults, memory leaks, run time errors, or compilation problems
What
Taken from cms-sw#40996
Tensorflow jobs can hang and throw out of memory exceptions in some relval matrix jobs.
Further investigation shows that
L1TMuonEndCapTrackProducer::produce()
may be producing extraneous tensorflow sessions, and the module is not producing it's own tensorflow session at module level.cms-sw#40996 (comment)
Recipe to Recreate
TBD/Check the log for the relval job
The text was updated successfully, but these errors were encountered: