-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend benchmarks #74
base: main
Are you sure you want to change the base?
Conversation
If the data had been created by one job in this repo and then used by another job, then the upload-artifact and download-artifact actions would be useful. But, since you're downloading data from elsewhere, you don't want artifacts... you want the actions/cache action. I do a similar thing for the nd2 repo you could use it as a model. I'd suggest making a separate script ( |
I'm taking a look at the code locally now, but based on my experience when I was getting the benchmarking running initially I wouldn't be surprised if the I don't have any personal experience with parameterize and fixtures but could you set up some sort of dictionary where you pass the key into parameterize and then construct the dictionary with the fixture data inside the test function? Re: cacheing I also have an example of using caching for the benchmarking results (although maybe I should have used artifacts?) traccuracy/.github/workflows/ci.yml Lines 73 to 91 in ed2b7b1
Happy to help you get that part of the action worked out if you need it, but I think Talley has the right idea with the download script. |
one thing that you might consider looking into is https://codspeed.io |
... eh... probably not worth it, the benchmark-action seems plenty good. codspeed is probably an unnecessary complication at the moment |
Ok after looking into it a bit more, it seems like parametrize works just fine with benchmark which is great. Looks like this could be a workaround for the fixtures in parametrize issue. @tlambert03 I noticed in the |
Thanks @tlambert03 @msschwartz21 for the suggestions. I'll come back with a second iteration of the PR. |
Yeah, only use the decorator if you want to time the entire test from start to finish with no setup or teardown (good for simple tests). But I don't always use it. See https://github.com/pyapp-kit/psygnal/blob/main/tests/test_bench.py for a file that uses both forms |
Hi @msschwartz21 @tlambert03 , I finally got to this, and did the following:
Even though the benchmark runs fine locally its github action currently fails with an unknown error, to be fixed. Left to do, caching the datasets in github actions. |
@bentaculum just wanted to give you a heads up that I'm in crunch mode finishing up writing my thesis. Hopefully next week I'll have time to review and help figure out what's causing the benchmarking action to fail. |
No worries, and good luck! |
Hi @msschwartz21, thanks for looking over this one and a fixing it up. I would suggest to not limit the benchmark to a few frames but instead to run it on the entire video to catch possible increases in metric runtime. |
If we want to speed up the benchmark we can consider creating a subset of the 3d dataset on disk, or tweaking |
I still don't know why the benchmark runs locally but fails on GH actions, even with the increased time limit that you put ... |
Sometimes the action is failing with a 143 error code which I think means the vm is being terminated probably due to resources maxing out. It seems to always fail on the ctc matcher on 3D. I was going to try limiting the volume to a smaller roi to reduce the number of nodes. If that works then we can keep the full length of the video. |
Everything that I could find about github actions failing pointed to a possible memory issue. I also remembered that we decided to err on the conservative side in terms of creating copies of data before possibly modifying a graph. I got rid of the copy in the initializer of the
|
Just wanna chime on in this as it would be great to not have to copy the data. I have my local traccuracy install always with graph copying disabled, because in a lot of my pipelines, copying the graph just instantly hits my RAM limit, which is annoying for comparatively small datasets. Will also note that I think our |
Have you noticed any weird behavior when having the copy disabled? But generally I agree that we should eliminate the copying. It was an easy way to get unblocked when we were getting started but it's a pretty obvious limitation on dataset size. |
No, none. Caveat would be that I don't have any multithreaded/distributed code. But I do feel that if someone does have that, controlling edits and object access is on them, not on us. I think we should definitely figure out what's going on in that failing test though, and whether it's likely to affect users, whether it's an artifact of how we wrote the test... etc. |
This reverts commit d8c595a.
It turns out the memory problem that we were encountering in I rescoped all of the benchmarking fixtures back to function scope and eliminated @bentaculum Can you take a look and confirm that I haven't eliminated anything that you wanted in there in the process of trying to get it to work on github? |
Hi @msschwartz21, thanks for looking into this once more. Given #166 and the memory issues on GH actions, it seems that loading data separately is the way to go at the moment. |
Dang it, I didn't notice that the action failed after I merged in the latest changes 🤯 I'll take another look today and see if I can get it to pass. |
Tried to move this change into a different PR, but that caused the action to start failing This reverts commit c1ba1d5.
@bentaculum Looks like I messed up the action when I reverted the commit that eliminated copying of the graph in the matcher. I wanted to move it into another memory related PR, but clearly it mattered for this one. I reverted the revert commit and I think we're now actually good to go! |
Awesome 🎉! I suggest merging this as is. Benchmarking takes a while now (~200s) due to loading datasets repeatedly, but on the plus side it now measures performance on a full standard CTC 3d dataset, which is a common use case, and it can be sped up significantly once the @DragaDoncila @cmalinmayor any major concerns? |
This is an initial attempt to extend the existing 2d benchmarks to a 3d C. elegans dataset from the Cell Tracking Challenge. I'm not sure about many things in here, for example:
pytest.mark.parametrize
to consume fixtures.