-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutation score is changing and not stable enough #62
Comments
…and the drop and no code/test were changes in that module). I've now raised STAMP-project/pitest-descartes#62
…and the drop and no code/test were changes in that module). I've now raised STAMP-project/pitest-descartes#62
One guess is that timeouts are in play here and when there's a timeout, it's counted as a bad score. We could increase the timeout I guess but that'll also increase a lot the build time (we build all XWiki with descartes). |
When PIT performs the mutation analysis it multiplies by a factor the timeout considering the original execution time. By default this factor is 1.25. This timeout could be configured to be used only in the mutation analysis phase instead of changing the original timeout. You can check the |
Hi @oscarlvp we recently configured some threshold to detect mutation score regression, but we got really stuck with those timeouts even when setting the For example, our build spotted a mutation score of 80 against 88, that I was not able to reproduce on my own computer. When setting the It's really hard to stabilize a threshold in this case. Any advice how we could improve that? Is there a limit value for the timeoutFactor? Should we use a timeoutConstant instead? |
FTR here's what we did with the timeoutFactor: xwiki/xwiki-commons@10dd48f |
Recent case of timeout: https://github.com/xwiki/xwiki-platform/pull/946/files |
Report for xwiki/xwiki-commons@88f8b64 included here |
Report for xwiki/xwiki-commons@85c9c98 included here |
I guess you meant xwiki/xwiki-commons@85c9c98 (I got a 404 with your link). AFAIK we computed the score by running the following command line and going to the pitest report:
|
@surli This is the same way I'm computing the score to reproduce the issues. However, there is no change for this particular commit and module over 100 executions. So I'm wondering, if by any chance, there was a mistake with the first score. Is the issue still happening on your side? |
I don't know for this module maybe @vmassol has more information about that one. Now I know that for https://github.com/xwiki/xwiki-platform/pull/946/files even executing it dozens of time on the same machine I didn't get a change. It's when executing it on another machine with other spec (less memory / processor) that I spotted the difference. |
As for xwiki/xwiki-commons@2032022 the analysis produces a 0 mutation score as no mutant is reported as covered. There is a configuration problem. |
@vmassol @surli There is indeed a difference on how the initial scores were computed. PIT was configured in xwiki/xwiki-commons@cb0ff74 to use Junit 4. In xwiki/xwiki-commons@0a7d7ee the configuration switched to Junit 5. This was done on 22-05-2018. The same day were reported significant differences in the score, which are the ones listed in the initial description of this issue.
As hinted by the configuration issue for xwiki/xwiki-commons@2032022 the drastic score change could be explained by the change of test plugin for PIT from Junit 4 to Junit 5. The Junit 5 plugin is not as stable as the other and may be missing some tests. |
Thanks for this report @oscarlvp ! To continue on this topic, we got this weekend a report from our pit-descartes CI job with this:
The job is now back to normal. What's interesting is that nothing has been committed this weekend on any of our repo on master, and we didn't change anything in our config. So it does prove that we still experiencing this timeout issue. You can find the job report here on our CI |
You're welcome @surli. |
@surli From the details in the console output it can be seen that the variation concerns two mutants: a |
I don't think it is we run it with a |
You can get the last build's workspace from https://ci.xwiki.org/job/xwiki-platform_pitest/ws/xwiki-platform-core/xwiki-platform-webjars/xwiki-platform-webjars-api/target/pit-reports/ |
Thanks @vmassol |
@oscarlvp Hi we got another example of mutation score changing. This time on module xwiki-platform-observation-remote, the threshold is set at 80 and we got a build we reached only 75, see there This time I saved the reports before triggering back the build. You can find them attached: build 160 was failing, 161 was ok. |
Note that today (2020-02-12) we had an unstability again with no code change AFAICS:
I updated the mutation score to the current value at xwiki/xwiki-commons@ad888d7#diff-4e06773273323f2703b4c3a54f9afd47R36 and it passed on the CI yesterday and the days before and today, suddenly, it failed in https://ci.xwiki.org/job/xwiki-commons_pitest/724/console |
And another one found today: `[ERROR] Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-platform-mailsender: Mutation score of 26 is below threshold of 44 -> [Help 1]`` I tried it locally and I have different mutation scores everytime I run it. Sometimes 9%, sometimes 34%, sometimes 18%, etc. In the logs I see plenty of the following:
However this happens only when pitest/descartes executes. Note: I've tried to check the generated mutations but it doesn't work. When using the EXPORT feature, I get an error:
I've also made sure that the mail SMTP socket is closed after each test. Didn't help. FTR here's what I did in MailSenderApiTest:
Any idea is most welcome. |
Another one: |
@vmassol the case of the |
Executing the following goal on
ends in the following error:
|
@oscarlvp that's weird, it works fine on our CI + locally (just retested now). Maybe try with "-U" to make sure you have recent deps? I ran it from What JDK are you using? I'm on:
|
I'm usind JDK 13 |
yes could well be. It's possible that they remove some stuff or that it's optional, etc. On our side we build with java 8. |
Got a new one (from https://ci.xwiki.org/job/xwiki-platform_pitest/558/console):
Same as #62 (comment) |
Another flicker from today:
|
Another flicker today:
Unless the following change could cause the mutation score to change but then I'd be curious to know why: xwiki/xwiki-platform@9ebbeb6 Thanks! |
New flicker today:
|
Is it possible to mention this issue in all the commits you make lowering the mutation score so we have a direct link to the related information, it has been done in in some cases above? |
Indeed, I don't do it systematically. I'll try to remember it. However, I reference the issue in all the commit content (been do it for some time now; wasn't do it initially when this issue was created though). So for example you can see all cases here:
|
@vmassol Thanks a lot! |
@vmassol I get the following error while trying to build
I used the following command
I have the XWiki repositories configured in |
@oscarlvp on which branch are you? master? Did you |
@vmassol Yes I'm on master after |
@oscarlvp Just tried it and it worked fine. I've used |
@oscarlvp maybe you're running from a different directory? The |
@vmassol Indeed I made a mistake while mounting the working directory in docker, I realised that from your comment. The project is building OK now, so I should be able to see what is happening. |
@vmassol
So, for example The following are transformations always reported as
Given that the outcome of the transformations of the first table above is erratic, I wouldn't be surprised if some of the cases in the second table have also an erratic behaviour, and some of them could be reported as The tests related to methods above use mocks and also deal with files. Both things might be related to this kind of erratic outcome. Do these tests affect the content of external files? |
Since the nature of these score drops may be quite different from one to the other, I propose to open separated issue for each one of them. In this way it is easier for me to track the number of times they happen and check if they get solved. |
Noted, I'll start providing the PIT reports when I see flickers. Now it'll only be for new flickers since we already reduced the thresholds for the ones reported.
I see that @surli put the following in our pom:
Does it mean that
Not sure what you mean by "content of external files". You mean do these test call java code located in classes other than the test class, that use the File API (or the NIO API) directly or indirectly?
Ok I can start doing that if you prefer. I wanted to have everything in the same place since the topic is the same (and the cause might be too) and it's harder to relate the issues together. Maybe introduce some label for that? |
@oscarlvp many thanks for looking into this! :) |
This feature is totally on PIT's side. Its main purpose is to detect those mutants that may cause an infinite loop. Say for example that a method that should return
Yes, maybe that. What I simply meant was if they touch files and if the mutated code could make those files corrupt and affect other tests.
A label could be nice. My point is that if these score drops keep piling up it will be harder to keep track of them in the same issue.
Don't mention it :) |
@oscarlvp Hi. Hope you're doing good. I have some bad news: on the xwiki project we've decided to remove pitest/descartes from our build FTM. We had too many false positives and the developers did not have faith in the execution anymore. The cost of maintaining it was outweighting the perceived benefits. And unfortunately we were not ready to take the ownership of the development of pitest/descartes ourselves. I feel that without the false positives, we would have continued using it. But if in the future there's a version of pitest/descartes that fixes the issue, we'll be able to revisit it and put back all that we had setup. I've been careful to list all places and to make it easy to rollback, see https://jira.xwiki.org/browse/XCOMMONS-1960 So I apologize because I know this is your baby and you may feel that we/I are letting you down. I personally believe in the mutation testing concept. There's probably not that much work remaining to use it the way we wanted to use it. I acknowledge that there are other ways of using it and we'll continue using pitest/descartes in these ways, namely using it as an aid inside your IDE when you're writing the tests. Let's keep in touch. Feel free to contact me if you wish to discuss more. Just taking the occasion to thank you again for your great work and support all along. |
@vmassol Bad news indeed. IMHO you could remove it only from xwiki-platform. Most of the score issues come from that project. The mutation testing strategy definitively conflicts with the way you test the platform modules. |
Hi, I've updated pitest/descartes to use descartes 1.2, pitest 1.4.0 and pitest-junit5-plugin 0.5 and I've found today that plenty of mutation scores are changing (without any change to the sources). I don't know why but it's looking bad.
Somehow I have the feeling it could be related to the introduction of pitest-junit5-plugin in xwiki/xwiki-commons@0a7d7ee
Some examples:
Any idea?
The text was updated successfully, but these errors were encountered: