-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic failure of test_kraken_multi teardown #839
Comments
i've seen this, have fixes, will check in soon. |
Is it a race condition of multiple threads trying to remove the same shared object file? |
Oh... maybe the Picard pipes aren't being closed properly before end-of-method and the test fixture blows it away before Picard was done? |
i think it has to do with tmpdir_function and tmpdir_module fixtures in conftest.py . reading the docs of tmpfile_factory, that way of creating tempdirs is less robust than with mkdtemp. also, the rmtree calls don't check if the tree still exists before trying to remove it. |
Maybe we should add a |
maybe there are two separate issues here :) |
Yes it could be all of the above. The reason I'm thinking @tomkinsc 's highlighted line of code might be at play is because if you google around for libgkl_compression.so, it's all Picard/GATK-related. So I think the |
simplest fix is to just check if tree exists before shutil.rmtree(new_tempdir) . but i think it's better to rewrite to use mkdtemp |
I bet both bits of code have issues that only really present as noticable problems when in combination. I like using mkdtemp but I think it's still good to rmtree at the end of a test fixture because as the test suite increases in size, you run out of local tmp space if you don't clean up after big tests. Can't we just run BTW I'm not sure if the problem is that the directory doesn't exist vs. a file that was originally in the directory tree (when |
maybe util.file.fifo() should poll the pipes after yield, before unlinking them? |
and/or add a small time delay before unlinking pipes and returning. |
Actually I just realized that the kraken code was using The Picard side of the pipe can happily play with standard python pipe-fitting (it can interleave the fastqs and is often used that way to pipe to bwa). It's the kraken side that has issues. Maybe we can revert the code here to use temp files again instead of fifos until we eventually tackle the larger question of how to clean up the pipe-fitting while also adding unpaired read support (#820). Then again, I'm not quite sure why reverting to temp files would solve the race condition... hm. Or time delay... |
adding a 2s delay at the end of metagenomics.diamond() does fix one transient bug in my experiments on AWS. |
Wow, 2s is much longer than I would've tried (maybe 100ms).. Oh wait! We're calling |
I’ve had nondeterministic test failures in the past but never this specific one. I think it’s highly likely to be related to some combination of xdist, temp directory removal, and having background processes. I don’t think the named fifos should be causing problems because they are essentially real files to the program as long as they don’t try to seek or tell. I’ve tried in the past to add ps.wait commands to finish up all the background processes but it didn’t seem to have any effect. Regardless, adding it in should not cause any harm. Mainly trying to glue together std streams of subprocesses through python seemed to cause the most issues, so that’s something that I try to avoid and use named fifos instead. |
Sometimes Travis fails, sometimes it succeeds, on this particular part of the test suite. Probably some race condition as xdist parallelizes the tests.
Mysteries include:
libgkl_compression2251932993111688424.so
non-deterministic?os.unlink
trying to use a compression library to delete files?The text was updated successfully, but these errors were encountered: