Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imageruler revamp #30

Merged
merged 36 commits into from
Mar 28, 2024
Merged

Imageruler revamp #30

merged 36 commits into from
Mar 28, 2024

Conversation

mfschubert
Copy link
Contributor

As discussed in #25 and agreed on during discussion on 3/21, it would be beneficial to simplify the imageruler API and extend testing, so that e.g. automated usage becomes viable. This PR is the follow-up from that discussion.

The changes in this PR include,

With these changes, #25, #24, #23, and #22 can be considered resolved.

For #22 specifically, I created a notebook which shows how the new ignore scheme addresses the issue.

@oskooi @stevengj @mawc2019 @ianwilliamson

mfschubert and others added 28 commits February 28, 2024 17:15
* Redo package

* Fix version number v0.1.0

* Formatting

* Docstring darglint

* Update imports

* Formatting

---------

Co-authored-by: Martin Schubert <[email protected]>
Switch to tometrics algorithm
Add TO designs and bump version
Notebook for ignore schemes
@ianwilliamson
Copy link
Collaborator

Nice work! Is there any way to mark the moved / renamed files as such? This would help condense the diffs.

@mfschubert
Copy link
Contributor Author

Nice work! Is there any way to mark the moved / renamed files as such? This would help condense the diffs.

Unfortunately, this may be difficult to do at this point, but I can list the files which have not changed substantially.

  • regular_shapes is basically unchanged
  • test_regular_shapes is a slightly changed version of the original imageruler_test; it contains tests based on shapes from the regular_shapes module. The main difference is that we no longer check for duality between binary opening and closing, as closing not implemented in the new imageruler module (it is not required).
  • simple_shapes notebook is renamed from the original examples notebook, and only updated as needed for the new api.



# ------------------------------------------------------------------------------
# Array-manipulating functions backed by `cv2`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are doing a giant reorg, can we move some of the functions below this line into their own modules (e.g. morphology.py and/or array_utils.py)?

src/imageruler/imageruler.py Outdated Show resolved Hide resolved
src/imageruler/imageruler.py Outdated Show resolved Hide resolved
@mawc2019
Copy link
Collaborator

This PR does not contain the function for computing the minimum lengthscale of void regions and the function for computing the overall minimum lengthscale. In the current main branch, the two functions are minimum_length_void() and minimum_length(). I do not think these two functions should be omitted, especially the second one, which generally has lower computational cost compared with computing both solid and void minimum lengthscales and then taking the minimum. The corresponding lengthscale violation functions are also not included in this PR.

@mfschubert
Copy link
Contributor Author

This PR does not contain the function for computing the minimum lengthscale of void regions and the function for computing the overall minimum lengthscale. In the current main branch, the two functions are minimum_length_void() and minimum_length(). I do not think these two functions should be omitted, especially the second one, which generally has lower computational cost compared with computing both solid and void minimum lengthscales and then taking the minimum. The corresponding lengthscale violation functions are also not included in this PR.

Yes, this was a judgement call---to eliminate redundant functionality, in service of a simpler API for which consistency and correctness is easier to ensure.

The overall minimum length scale can be computed by min(minimum_length_scale(x)), and e.g. length scale violations for void can be computed by length_scale_violations_solid(~x, length_scale). We could add functions with these implementations (i.e. trivial wrappers for existing functions), but I think it is counter to the objective if we were to add new functions that have different implementations.

@mawc2019
Copy link
Collaborator

mawc2019 commented Mar 26, 2024

The overall minimum length scale can be computed by min(minimum_length_scale(x)), and e.g. length scale violations for void can be computed by length_scale_violations_solid(~x, length_scale). We could add functions with these implementations (i.e. trivial wrappers for existing functions),

Yes, I understand these two functions can be trivially implemented in this way, and I do not insist on adding this trivial wrapper for void or not, but I prefer not to add the trivial wrapper for overall minimum lengthscale as min(minimum_length_scale(x)). Instead, I still prefer a function for overall minimum lengthscale with the previous open-close approach, which has lower computational cost than computing the two lengthscales and taking the minimum. This advantage is not obvious in our test examples, which only involve design patterns with small arrays. But for design patterns with large arrays, the difference may be obvious, especially in the cases where solid and void minimum lengthscales require very different numbers of binary searches.

but I think it is counter to the objective if we were to add new functions that have different implementations.

I think a new function with different implementation is not something that should be avoided if this different implementation has advantage.

@mfschubert
Copy link
Contributor Author

mfschubert commented Mar 27, 2024

I think a new function with different implementation is not something that should be avoided if this different implementation has advantage.

My philosophy here is that the additional code and complexity constitutes a disadvantage, and we have to balance the pros against the cons. I think we may have an opportunity to discuss on Thursday.

In any case, with regards to performance there are some improvements with the new code. Here is a benchmarking snippet:

def separated_circles(separation_distance: int) -> onp.ndarray:
    left_circle = imageruler.get_kernel(80)
    right_circle = imageruler.get_kernel(60)
    right_circle = onp.pad(right_circle, ((10, 10), (0, 0)))

    circles = onp.concatenate(
        [left_circle, onp.zeros((80, separation_distance)), right_circle],
        axis=1,
    )
    circles = onp.pad(circles, ((10, 10), (10, 10))).astype(bool)
    return circles

print("Using `minimum_length`")
%timeit imageruler.minimum_length(separated_circles(1))
%timeit imageruler.minimum_length(separated_circles(5))
%timeit imageruler.minimum_length(separated_circles(10))
%timeit imageruler.minimum_length(separated_circles(20))

print("Using `minimum_length_solid_void`")
%timeit imageruler.minimum_length_solid_void(separated_circles(1))
%timeit imageruler.minimum_length_solid_void(separated_circles(5))
%timeit imageruler.minimum_length_solid_void(separated_circles(10))
%timeit imageruler.minimum_length_solid_void(separated_circles(20))

print("")
for dim in [1, 5, 10, 20]:
  print(f"Separation distance = {dim}")
  print(f"  Absolute minimum length scale: {imageruler.minimum_length(separated_circles(dim))}")
  print(f"  Minimum (solid, void): {imageruler.minimum_length_solid_void(separated_circles(dim))}")

With the existing implementation (using a colab CPU instance), this prints

Using `minimum_length`
308 ms ± 90.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
225 ms ± 4.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
204 ms ± 20.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
319 ms ± 73.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Using `minimum_length_solid_void`
408 ms ± 5.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
448 ms ± 8.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
398 ms ± 80.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
504 ms ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Separation distance = 1
  Absolute minimum length scale: 3.513671875
  Minimum (solid, void): (59.974609375, 3.513671875)
Separation distance = 5
  Absolute minimum length scale: 5.447265625
  Minimum (solid, void): (59.974609375, 5.447265625)
Separation distance = 10
  Absolute minimum length scale: 10.474609375
  Minimum (solid, void): (59.974609375, 10.474609375)
Separation distance = 20
  Absolute minimum length scale: 20.529296875
  Minimum (solid, void): (59.974609375, 20.529296875)

As you said, there is some performance benefit for minimum_length (approaching 2x). However, running the same code using the new implementation, we get

Using `minimum_length_scale`
12.3 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
7.65 ms ± 179 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.92 ms ± 239 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
17.8 ms ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Separation distance = 1
  Minimum (solid, void): (60, 1)
Separation distance = 5
  Minimum (solid, void): (60, 5)
Separation distance = 10
  Minimum (solid, void): (60, 10)
Separation distance = 20
  Minimum (solid, void): (60, 20)

So, there is a more than 10x performance improvement with the new implementation.

@mawc2019
Copy link
Collaborator

there is a more than 10x performance improvement with the new implementation.

Nice test and great improvement! The new implementation indeed runs much faster than the old implementation, but the advantage of the open-close approach over the min(minimum_length_scale(x)) approach needs to be gauged within the same implementation, as you did for the old implementation. I believe that, just like their counterparts in the old implementation, if the open-close approach is added in this new implementation, it would be faster than its own min(minimum_length_scale(x)) approach.

@mfschubert
Copy link
Contributor Author

there is a more than 10x performance improvement with the new implementation.

Nice test and great improvement! The new implementation indeed runs much faster than the old implementation, but the advantage of the open-close approach over the min(minimum_length_scale(x)) approach needs to be gauged within the same implementation, as you did for the old implementation. I believe that, just like their counterparts in the old implementation, if the open-close approach is added in this new implementation, it would be faster than its own min(minimum_length_scale(x)) approach.

Yes, I agree with this.

@stevengj
Copy link
Contributor

If performance is not too critical, then the extra code complexity of adding the open–close algorithm might not be worth it for a factor-of-two improvement, at least at the current scales where we expect to apply this code. So, the simplicity of leaving it out might be better.

However, if we leave out this optimization, it would be worth adding a comment to the code mentioning this possibility if we want to improve performance in the future.

@mawc2019
Copy link
Collaborator

I just talked with Steven and now agree with removing the open-close approach.

mfschubert and others added 2 commits March 28, 2024 10:06
Improve docstring and variable naming
Copy link
Collaborator

@oskooi oskooi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I noticed that the three notebooks (to_designs.ipynb, simple_shapes.ipynb, and advanced.ipynb) do not include the outputs from actually running them. Perhaps this was intentional?

tests/test_imageruler.py Outdated Show resolved Hide resolved
@mfschubert
Copy link
Contributor Author

LGTM.

I noticed that the three notebooks (to_designs.ipynb, simple_shapes.ipynb, and advanced.ipynb) do not include the outputs from actually running them. Perhaps this was intentional?

Correct, to view the output one should look at the docs page, which you can preview here: https://mfschubert.github.io/imageruler/readme.html

Docs are automatically generated when there is a push to main branch, and notebook outputs being stripped is actually enforced by one of the pre-commit rules. This way, we avoid having large (code) files as part of the repo.

Copy link
Collaborator

@mawc2019 mawc2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos.

src/imageruler/imageruler.py Outdated Show resolved Hide resolved
src/imageruler/imageruler.py Outdated Show resolved Hide resolved
src/imageruler/imageruler.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants