-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sporadic test-waic
failure (possibly only on MacOS)
#1368
Comments
@paciorek I have spent a while digging into this. I'm as baffled as you are. From a shell I am using Also I did reproduce this problem on 1.0.1 (current release). I am wondering if it has to do with changes in R or OS X tool chains, unlikely as that seems. I do have ccache running, but we are seeing this on your system and @danielturek 's system and the CI build systems, so unless they all have ccache running, that doesn't seem likely to be the culprit. I like the "--preclean" option. It would be easy to insert. It would be a big change in that it would touch every compilation call we make. But it also makes sense. What do you think? |
Yeah, I think your debugging/digging steps were similar to mine. I'm a bit wary of making the One thought is that we set up a nimble option to cause |
Here is a sporadically reproducible nearly-minimal example entirely outside of nimble. As file1.cpp:
As file1.h:
As file2.cpp:
as file2.h
Then from R, run:
The .h and .cpp files might not be fully minimal yet, but close. |
Nice - extracting this out of the nimble context is a big step forward. One thought here is that I could run this by our SCF sysadmin (Ryan Lovett). I'm also going to try running on my new M2 Mac and see what happens. |
Ok, I've reproduced this on my M2 Mac. If I insert a 3-second sleep before the file2.cpp compilation, everything seems fine. So it feels like a timestamp kind of thing. |
I think the following reproduces things using make. Here's the Makefile:
Here's the reprex:
|
Actually, I no longer think the Ryan is also flummoxed. |
And one more thing -- I can reproduce the problem on Linux. |
I think I've changed my mind and agree with the idea of using |
Here's more experimental evidence that the issue has to do with timestamps. The following code runs 100 iterations of the copy-compile scheme I set up above. For each one, it records whether the second compilation was properly done (the .so contains foo_2) or was not done (the .so still contains foo_1). If the When I extract timestamps using When I am somewhat baffled that the timing mechanisms used for builds can have this kind of problem/phenomenon. It does seem that
|
I'm preparing a branch with |
Just a note that at least for Laplace, this adds an extra, unneeded compilation of nimbleCppADbaseClass.cpp that, for the crossed random effects example in
So at some point we might want to look more carefully at managing what is recompiled more directly. |
We're seeing this test failure, possibly only on MacOS:
I was able to reproduce this (fairly reliably but not every time I ran the specific test) on my laptop. Here's what seems to be going on:
calculateWAIC
in lines 562-572, often line 567 or line 572.172
label is in the .cpp/.h file. It is also172
in the interface function in R. However the symbol in the .so file is an "old" one, with a label like170
from one of the previous compilations involved in the repeated calls tocalculateWAIC
.calculateWAIC
fails, theP_<ID>_waicClass_offline.o
file is not being regenerated. Only theclang++
call that creates the .so file from the .o file is invoked, not theclang++
call that creates the .o file from the .cpp file. As a result the old170
symbol is in the .so file. [see below for more details]R CMD SHLIB
, which apparently usesmake
(I haven't looked into details).make
that causes it to think that the .cpp/.h files haven't changed and therefore that the .o file doesn't need to be regenerated, even thoughls -l
clearly shows that the .o file is older than the .cpp/.h files.Here's what we expect our
system2
call toR CMD SHLIB
to do:As indicated above only the second clang++ call occurs in the cases when the error occurs, despite the
SHLIBargs
incppProjectClass$compileFile
being the same.Not sure we want to go this route but using
R CMD SHLIB --preclean
seems to avoid the issue, though since it is sporadic, I can't 100% guarantee it. But in many repeated tries, it no longer had the error. Or perhaps we would generally want to useR CMD SHLIB --clean
with the thought that we don't need the various .o files anyway so cleaning them up could make good sense generally.I'm hoping that because this seems to occur with repeated compilations in this specific situation where the same .cpp/.h/.o file names are used repeatedly that it is not something users would encounter. I don't think we've gotten similar reports though I have a vague feeling we've seen "no such symbol" report(s) before.
Before I do anything more with this, I'll plan to discuss with @perrydv .
The text was updated successfully, but these errors were encountered: