You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We tried to benchmark VISTA-3D for accuracy (dice score), so we ran one locally to create baselines and another one in our CI pipeline to create benchmarks. However, we realized that we cannot reproduce these metrics and baseline and benchmark differ:
It's the same as your benchmark data (difference from baseline data). In addition, the inference results are reproducible according to my multiple rounds of test.
**Describe the bug
We tried to benchmark VISTA-3D for accuracy (dice score), so we ran one locally to create baselines and another one in our CI pipeline to create benchmarks. However, we realized that we cannot reproduce these metrics and baseline and benchmark differ:
Here are the tests cases that we used:
and here are the test cases for speed:
Environment
The baseline and banchmak are being run on different machines but the same container.
The text was updated successfully, but these errors were encountered: