Imageruler revamp #30

mfschubert · 2024-03-22T20:57:27Z

As discussed in #25 and agreed on during discussion on 3/21, it would be beneficial to simplify the imageruler API and extend testing, so that e.g. automated usage becomes viable. This PR is the follow-up from that discussion.

The changes in this PR include,

update and simplify the implementation of the core imageruler algorithm, going from ~1000 LoC to ~600
adds a scheme for ignoring edges of large features only, addressing Unexpected limitation of measurement accuracy for lengthscales larger than pixel dimensions #22
adds comprehensive testing for the algorithm, including more than 200 new tests
updates the package so that it can easily be deployed as a pypi project (this has not yet been done)
adds automatic docs, which can be previewed at https://mfschubert.github.io/imageruler/readme.html. This includes two new pages demonstrating length scale measurement on actual TO-generated designs, and one demonstrating advanced usage.

With these changes, #25, #24, #23, and #22 can be considered resolved.

For #22 specifically, I created a notebook which shows how the new ignore scheme addresses the issue.

@oskooi @stevengj @mawc2019 @ianwilliamson

* Redo package * Fix version number v0.1.0 * Formatting * Docstring darglint * Update imports * Formatting --------- Co-authored-by: Martin Schubert <[email protected]>

Add py.typed

Switch to tometrics algorithm

Reference design readme

Add docs deps

Fix eof

Add TO designs and bump version

Add skimage docs dep

Fix authorship in docs

cleanup

Notebook for ignore schemes

ianwilliamson · 2024-03-23T00:59:24Z

Nice work! Is there any way to mark the moved / renamed files as such? This would help condense the diffs.

mfschubert · 2024-03-25T17:02:51Z

Nice work! Is there any way to mark the moved / renamed files as such? This would help condense the diffs.

Unfortunately, this may be difficult to do at this point, but I can list the files which have not changed substantially.

regular_shapes is basically unchanged
test_regular_shapes is a slightly changed version of the original imageruler_test; it contains tests based on shapes from the regular_shapes module. The main difference is that we no longer check for duality between binary opening and closing, as closing not implemented in the new imageruler module (it is not required).
simple_shapes notebook is renamed from the original examples notebook, and only updated as needed for the new api.

ianwilliamson · 2024-03-25T18:43:16Z

src/imageruler/imageruler.py

+
+
+# ------------------------------------------------------------------------------
+# Array-manipulating functions backed by `cv2`.


Since we are doing a giant reorg, can we move some of the functions below this line into their own modules (e.g. morphology.py and/or array_utils.py)?

src/imageruler/imageruler.py

mawc2019 · 2024-03-26T15:04:43Z

This PR does not contain the function for computing the minimum lengthscale of void regions and the function for computing the overall minimum lengthscale. In the current main branch, the two functions are minimum_length_void() and minimum_length(). I do not think these two functions should be omitted, especially the second one, which generally has lower computational cost compared with computing both solid and void minimum lengthscales and then taking the minimum. The corresponding lengthscale violation functions are also not included in this PR.

mfschubert · 2024-03-26T15:31:23Z

This PR does not contain the function for computing the minimum lengthscale of void regions and the function for computing the overall minimum lengthscale. In the current main branch, the two functions are minimum_length_void() and minimum_length(). I do not think these two functions should be omitted, especially the second one, which generally has lower computational cost compared with computing both solid and void minimum lengthscales and then taking the minimum. The corresponding lengthscale violation functions are also not included in this PR.

Yes, this was a judgement call---to eliminate redundant functionality, in service of a simpler API for which consistency and correctness is easier to ensure.

The overall minimum length scale can be computed by min(minimum_length_scale(x)), and e.g. length scale violations for void can be computed by length_scale_violations_solid(~x, length_scale). We could add functions with these implementations (i.e. trivial wrappers for existing functions), but I think it is counter to the objective if we were to add new functions that have different implementations.

mawc2019 · 2024-03-26T17:15:43Z

The overall minimum length scale can be computed by min(minimum_length_scale(x)), and e.g. length scale violations for void can be computed by length_scale_violations_solid(~x, length_scale). We could add functions with these implementations (i.e. trivial wrappers for existing functions),

Yes, I understand these two functions can be trivially implemented in this way, and I do not insist on adding this trivial wrapper for void or not, but I prefer not to add the trivial wrapper for overall minimum lengthscale as min(minimum_length_scale(x)). Instead, I still prefer a function for overall minimum lengthscale with the previous open-close approach, which has lower computational cost than computing the two lengthscales and taking the minimum. This advantage is not obvious in our test examples, which only involve design patterns with small arrays. But for design patterns with large arrays, the difference may be obvious, especially in the cases where solid and void minimum lengthscales require very different numbers of binary searches.

but I think it is counter to the objective if we were to add new functions that have different implementations.

I think a new function with different implementation is not something that should be avoided if this different implementation has advantage.

mfschubert · 2024-03-27T16:08:13Z

I think a new function with different implementation is not something that should be avoided if this different implementation has advantage.

My philosophy here is that the additional code and complexity constitutes a disadvantage, and we have to balance the pros against the cons. I think we may have an opportunity to discuss on Thursday.

In any case, with regards to performance there are some improvements with the new code. Here is a benchmarking snippet:

def separated_circles(separation_distance: int) -> onp.ndarray:
    left_circle = imageruler.get_kernel(80)
    right_circle = imageruler.get_kernel(60)
    right_circle = onp.pad(right_circle, ((10, 10), (0, 0)))

    circles = onp.concatenate(
        [left_circle, onp.zeros((80, separation_distance)), right_circle],
        axis=1,
    )
    circles = onp.pad(circles, ((10, 10), (10, 10))).astype(bool)
    return circles

print("Using `minimum_length`")
%timeit imageruler.minimum_length(separated_circles(1))
%timeit imageruler.minimum_length(separated_circles(5))
%timeit imageruler.minimum_length(separated_circles(10))
%timeit imageruler.minimum_length(separated_circles(20))

print("Using `minimum_length_solid_void`")
%timeit imageruler.minimum_length_solid_void(separated_circles(1))
%timeit imageruler.minimum_length_solid_void(separated_circles(5))
%timeit imageruler.minimum_length_solid_void(separated_circles(10))
%timeit imageruler.minimum_length_solid_void(separated_circles(20))

print("")
for dim in [1, 5, 10, 20]:
  print(f"Separation distance = {dim}")
  print(f"  Absolute minimum length scale: {imageruler.minimum_length(separated_circles(dim))}")
  print(f"  Minimum (solid, void): {imageruler.minimum_length_solid_void(separated_circles(dim))}")

With the existing implementation (using a colab CPU instance), this prints

Using `minimum_length`
308 ms ± 90.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
225 ms ± 4.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
204 ms ± 20.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
319 ms ± 73.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Using `minimum_length_solid_void`
408 ms ± 5.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
448 ms ± 8.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
398 ms ± 80.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
504 ms ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Separation distance = 1
  Absolute minimum length scale: 3.513671875
  Minimum (solid, void): (59.974609375, 3.513671875)
Separation distance = 5
  Absolute minimum length scale: 5.447265625
  Minimum (solid, void): (59.974609375, 5.447265625)
Separation distance = 10
  Absolute minimum length scale: 10.474609375
  Minimum (solid, void): (59.974609375, 10.474609375)
Separation distance = 20
  Absolute minimum length scale: 20.529296875
  Minimum (solid, void): (59.974609375, 20.529296875)

As you said, there is some performance benefit for minimum_length (approaching 2x). However, running the same code using the new implementation, we get

Using `minimum_length_scale`
12.3 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
7.65 ms ± 179 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.92 ms ± 239 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
17.8 ms ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Separation distance = 1
  Minimum (solid, void): (60, 1)
Separation distance = 5
  Minimum (solid, void): (60, 5)
Separation distance = 10
  Minimum (solid, void): (60, 10)
Separation distance = 20
  Minimum (solid, void): (60, 20)

So, there is a more than 10x performance improvement with the new implementation.

mawc2019 · 2024-03-27T20:31:28Z

there is a more than 10x performance improvement with the new implementation.

Nice test and great improvement! The new implementation indeed runs much faster than the old implementation, but the advantage of the open-close approach over the min(minimum_length_scale(x)) approach needs to be gauged within the same implementation, as you did for the old implementation. I believe that, just like their counterparts in the old implementation, if the open-close approach is added in this new implementation, it would be faster than its own min(minimum_length_scale(x)) approach.

mfschubert · 2024-03-27T20:45:00Z

there is a more than 10x performance improvement with the new implementation.

Nice test and great improvement! The new implementation indeed runs much faster than the old implementation, but the advantage of the open-close approach over the min(minimum_length_scale(x)) approach needs to be gauged within the same implementation, as you did for the old implementation. I believe that, just like their counterparts in the old implementation, if the open-close approach is added in this new implementation, it would be faster than its own min(minimum_length_scale(x)) approach.

Yes, I agree with this.

stevengj · 2024-03-28T16:12:49Z

If performance is not too critical, then the extra code complexity of adding the open–close algorithm might not be worth it for a factor-of-two improvement, at least at the current scales where we expect to apply this code. So, the simplicity of leaving it out might be better.

However, if we leave out this optimization, it would be worth adding a comment to the code mentioning this possibility if we want to improve performance in the future.

mawc2019 · 2024-03-28T16:13:13Z

I just talked with Steven and now agree with removing the open-close approach.

Improve docstring and variable naming

oskooi

LGTM.

I noticed that the three notebooks (to_designs.ipynb, simple_shapes.ipynb, and advanced.ipynb) do not include the outputs from actually running them. Perhaps this was intentional?

tests/test_imageruler.py

Fix module docstring

mfschubert · 2024-03-28T17:48:33Z

LGTM.

I noticed that the three notebooks (to_designs.ipynb, simple_shapes.ipynb, and advanced.ipynb) do not include the outputs from actually running them. Perhaps this was intentional?

Correct, to view the output one should look at the docs page, which you can preview here: https://mfschubert.github.io/imageruler/readme.html

Docs are automatically generated when there is a push to main branch, and notebook outputs being stripped is actually enforced by one of the pre-commit rules. This way, we avoid having large (code) files as part of the repo.

mawc2019

Typos.

src/imageruler/imageruler.py

Fix some typos

Add comment about performance

mfschubert and others added 28 commits February 28, 2024 17:15

Redo packaging using pyproject.toml and src layout (#1)

75b01a4

* Redo package * Fix version number v0.1.0 * Formatting * Docstring darglint * Update imports * Formatting --------- Co-authored-by: Martin Schubert <[email protected]>

Add py.typed

78c319f

Merge pull request #2 from mfschubert/py.typed

4214906

Add py.typed

Switch to tometrics algorithm

39b4196

Add mypy dep

faa6f83

Add scipy dep

cd77424

Fix reference filename

7dcdd09

Type annotation

bb162bb

Merge pull request #3 from mfschubert/topology

e95ef5f

Switch to tometrics algorithm

Reference design readme

d76c408

Merge pull request #4 from mfschubert/readme

40abba9

Reference design readme

Add docs deps

322878a

Merge pull request #5 from mfschubert/docs

67f58a8

Add docs deps

Fix eof

1769d72

Fix naming

b4ad9dd

Merge pull request #6 from mfschubert/fix

ecebc4b

Fix eof

Version updated from v0.1.0 to v0.2.0

2aac639

Add to_designs notebook

f73d4a4

Merge pull request #7 from mfschubert/bump

947d2b8

Add TO designs and bump version

Add skimage docs dep

a43807e

Merge pull request #8 from mfschubert/skimage

df7b6aa

Add skimage docs dep

Add skimage docs dep

e47a7bc

Fix authorship in docs

00906f3

Merge pull request #9 from mfschubert/authors

f2c4ca6

Fix authorship in docs

cleanup

0d4240d

Merge pull request #10 from mfschubert/cleanup

3b8d68f

cleanup

Notebook for ignore schemes

3121b1b

Merge pull request #11 from mfschubert/ignore

be2b2cf

Notebook for ignore schemes

ianwilliamson reviewed Mar 25, 2024

View reviewed changes

mfschubert and others added 2 commits March 28, 2024 10:06

Cleanup

fa15284

Merge pull request #12 from mfschubert/cleanup

cebc527

Improve docstring and variable naming

oskooi approved these changes Mar 28, 2024

View reviewed changes

tests/test_imageruler.py Outdated Show resolved Hide resolved

mfschubert and others added 2 commits March 28, 2024 10:45

Fix module docstring

3db2b8e

Merge pull request #13 from mfschubert/docstring

bfc7c29

Fix module docstring

mawc2019 reviewed Mar 28, 2024

View reviewed changes

src/imageruler/imageruler.py Outdated Show resolved Hide resolved

src/imageruler/imageruler.py Outdated Show resolved Hide resolved

src/imageruler/imageruler.py Outdated Show resolved Hide resolved

mfschubert and others added 4 commits March 28, 2024 10:52

Fix some typos

9a0dfd8

Merge pull request #14 from mfschubert/typos

20fafdb

Fix some typos

Add performance comment

62e5a1c

Merge pull request #15 from mfschubert/perf

c7b9b4b

Add comment about performance

stevengj merged commit 8820c26 into NanoComp:main Mar 28, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imageruler revamp #30

Imageruler revamp #30

mfschubert commented Mar 22, 2024

ianwilliamson commented Mar 23, 2024

mfschubert commented Mar 25, 2024

ianwilliamson Mar 25, 2024

mawc2019 commented Mar 26, 2024

mfschubert commented Mar 26, 2024

mawc2019 commented Mar 26, 2024 •

edited

Loading

mfschubert commented Mar 27, 2024 •

edited

Loading

mawc2019 commented Mar 27, 2024

mfschubert commented Mar 27, 2024

stevengj commented Mar 28, 2024

mawc2019 commented Mar 28, 2024

oskooi left a comment

mfschubert commented Mar 28, 2024

mawc2019 left a comment



		# ------------------------------------------------------------------------------
		# Array-manipulating functions backed by `cv2`.

Imageruler revamp #30

Imageruler revamp #30

Conversation

mfschubert commented Mar 22, 2024

ianwilliamson commented Mar 23, 2024

mfschubert commented Mar 25, 2024

ianwilliamson Mar 25, 2024

Choose a reason for hiding this comment

mawc2019 commented Mar 26, 2024

mfschubert commented Mar 26, 2024

mawc2019 commented Mar 26, 2024 • edited Loading

mfschubert commented Mar 27, 2024 • edited Loading

mawc2019 commented Mar 27, 2024

mfschubert commented Mar 27, 2024

stevengj commented Mar 28, 2024

mawc2019 commented Mar 28, 2024

oskooi left a comment

Choose a reason for hiding this comment

mfschubert commented Mar 28, 2024

mawc2019 left a comment

Choose a reason for hiding this comment

mawc2019 commented Mar 26, 2024 •

edited

Loading

mfschubert commented Mar 27, 2024 •

edited

Loading