Find potential matches faster #61

robinmessage · 2023-09-22T11:32:03Z

Run time for Gola with 18 processes:

real 2m38.122s
user 39m43.770s
sys 0m29.981s

Which is a great improvement.

CPC alignment is off, and I think the matching to k might be slightly broken; I need to test the DRangedTree with the new CPC data to check the relative widths and zero widths are handled correctly.

…lumns

…alculation

Not perfectly aligned, but seems pretty good

…ints of S that match

mdales

I did a review pass, looks good, just a few things I'd like to see changed before we merge, most "we could do this" comments could become issues if you think they're right Robin.

I must confess that whilst I did get the general idea of DRangedTree works externally, I didn't review the implementation in detail, rather how it was used.

methods/matching/find_potential_matches_fast.py

methods/utils/dranged_tree.py

mdales · 2023-10-24T13:47:43Z

methods/matching/find_potential_matches_fast.py

@@ -0,0 +1,401 @@
+import argparse


This file should replace the old find_potential_matches.py, not sit along side it

Agree, but holding off until tmf-pipeline is updated to pick this up.

I've asked @patricoferris to comment on what he thinks the best plan here is in terms of commit ordering

Thanks for the thought, but just go for it, make the change, commit! I can pull arbitrary commits into arbitrary parts of the pipeline, so if I have to do SHA gymnastics to get a hold of the original potential matches codes that's ok (I already do that for other things). I'd rather the merged code have some order to it and the disorder can live in the pipeline should we need that :))

Agreed, merging once CI agrees.

rerunner.py

Horizontal striping roughly doubled performance, so well worth it Other changes minor

robinmessage · 2023-10-26T14:32:55Z

@mdales I think I have fixed everything or made an issue for it, if you wouldn't mind taking another look please?

mdales

Thanks Robin, LGTM. I'd still rather we replaced the old script rather than had two side by side, so I've asked Patrick to comment about how much that'd break his plans, and if necessary we can make an issue to update things as soon as we can.

mdales · 2023-10-26T14:42:18Z

methods/matching/find_potential_matches_fast.py

@@ -0,0 +1,401 @@
+import argparse


I've asked @patricoferris to comment on what he thinks the best plan here is in terms of commit ordering

patricoferris and others added 17 commits September 19, 2023 10:09

Filter potential matches with CPCs too

90a5a0d

Fix CPC ordering

8b3cad2

Upgrade rerunner to understand rescaled_cpcs and few other tweaks

bec9e71

Add DRangedTree utility for fast finding of potential matches

3b926b4

WIP: noddling around to make tree work fast with new structure and co…

c8d06dc

…lumns

WIP: trying to eliminate overlaps faster but currently broken

6b7e91f

WIP: working enough version of DRangedTree

57455fe

WIP: crashing in yigracheffe trying to retrieve cpc pixels

4f758f0

Kind of working CPC matching, but CPC appears offset from Patrick's c…

dd28d1c

…alculation

Tweaking CPC offsets

4d811ef

Not perfectly aligned, but seems pretty good

WIP: use coarse CPC; work around Yigracheffe issue

69ed5cf

Pull out number of divisions into a constant and increase to 40x40

69f69fd

WIP: manually realign Gola CPCs

5ce1171

Improved test suite to account for new CPC columns and fraction of po…

c28e567

…ints of S that match

Use expected_fraction to cut branches in DRangedTree early

23cc3d0

Fix DRangedTree bounding on left-hand side of values

99a7b93

Tweak DRangedTree self test

8cda030

robinmessage force-pushed the rhm31-cpc-10-faster branch from 4d51188 to 8cda030 Compare October 10, 2023 14:01

patricoferris changed the title ~~[DRAFT] Find potential matches faster~~ Find potential matches faster Oct 19, 2023

patricoferris mentioned this pull request Oct 21, 2023

The Great Convergence #69

Open

mdales requested changes Oct 24, 2023

View reviewed changes

Code review fixes, most specifically horizontal striping

3072cf7

Horizontal striping roughly doubled performance, so well worth it Other changes minor

mdales approved these changes Oct 26, 2023

View reviewed changes

robinmessage added 4 commits October 26, 2023 16:43

Fix command line description of find_potential_matches

db1f4d8

Replace find_potential_matches with fast version

13edeb4

Lint

72a0395

Types

a641f21

robinmessage merged commit 07a9079 into main Oct 26, 2023
1 check passed

robinmessage deleted the rhm31-cpc-10-faster branch October 26, 2023 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find potential matches faster #61

Find potential matches faster #61

robinmessage commented Sep 22, 2023

mdales left a comment

mdales Oct 24, 2023

robinmessage Oct 26, 2023

mdales Oct 26, 2023

patricoferris Oct 26, 2023

robinmessage Oct 26, 2023

robinmessage commented Oct 26, 2023

mdales left a comment

mdales Oct 26, 2023

Find potential matches faster #61

Find potential matches faster #61

Conversation

robinmessage commented Sep 22, 2023

mdales left a comment

Choose a reason for hiding this comment

mdales Oct 24, 2023

Choose a reason for hiding this comment

robinmessage Oct 26, 2023

Choose a reason for hiding this comment

mdales Oct 26, 2023

Choose a reason for hiding this comment

patricoferris Oct 26, 2023

Choose a reason for hiding this comment

robinmessage Oct 26, 2023

Choose a reason for hiding this comment

robinmessage commented Oct 26, 2023

mdales left a comment

Choose a reason for hiding this comment

mdales Oct 26, 2023

Choose a reason for hiding this comment