Porting over jump finding stuff #180

skhrg · 2024-05-20T16:57:49Z

Been a few years since I've written C++ for use by other humans, so some sanity checks that I don't implement things in a dumb way are appreciated.

mhasself

Thanks. A few things:

These functions assume the arrays (buffers) or C-ordered. But nothing enforces that. So, segfault city. It's pretty easy to generalize these for arbitrary strides; see other uses of BufferWrapper. Or else check that it's C-ordered and throw an informative error. But do not allow for segfaults.
Add docstrings to the bp::def entries.
Add tests.
The rest of so3g formats if and for blocks like this:

if (condition) {
    block;
}

So please add that space and remove that extra new line.

skhrg · 2024-06-26T12:36:17Z

@mhasself I think this is good to go, could you take another look? I'd like this get this faster jumpfinder out in the wild soon.

mhasself

Some comments inline.

I have some reservations about the jump finding algorithm. My immediate reactions were (a) this relies on "blocks" too much and (b) I don't understand what kind of jumps this will detect / not detect.

Adapting your unit test case, I loop over several signals, where the jump is shifting to the right. Note the threshold for detection is substantially higher than the true jump size -- 1.2.

for offset in range(20):
    a = np.zeros((1, 100), dtype="float32", order="C")
    a[0, 50+offset:] += 1
    b = np.zeros((1, 100), dtype="int32", order="C")
    min_size = np.ones(1, dtype="float32", order="C")*1.2
    so3g.matched_jumps(a, b, min_size, 10)
    flagged = np.where(b)[1]
    print(flagged)

What I get is:

[48 49 50]
[49 50]
[51]
[52]
[53 54]
[53 54 55]
[54 55]
[56]
[57]
[58 59]
[58 59 60]
[59 60]
[61]
[62]
[63 64]
[63 64 65]
[64 65]
[66]
[67]
[68 69]

So the sensitivity of the finder varies depending on where the jump is relative to the window edge. I think we can cook up versions of this that will have a more consistent kernel. It will probably involve a boxcar filter -- this is easy to implement and is the continuous limit of your block-moment based approach.

mhasself · 2024-06-26T13:20:52Z

test/test_array_ops.py

+            flagged,
+            np.array(
+                [
+                    0,


Please collapse this to ~2 lines. It would make sense to have a line break between 7 and 50, and 59 and 95. It might be clearer to construct this as an hstack of aranges.

This is the reason some people don't like automatic formatters!

mhasself · 2024-06-26T13:25:26Z

src/array_ops.cxx

+            "  atol: how close to the multiple of scale a value needs to be to be a jump\n"
+            "        should be an array (float32) with shape (ndet,)\n"


Is atol in units of the signal, or a fraction of the "scale"?

mhasself · 2024-06-26T13:26:23Z

src/array_ops.cxx

+}
+
+template <typename T>
+void scale_jumps(const bp::object & tod, const bp::object & out, const bp::object & atol, int win_size, T scale)


Please rename to find_scale_jumps. Or find_quantized_jumps. "Scale" is a verb and a noun so the name is currently pretty confusing.

mhasself · 2024-06-26T13:36:08Z

src/array_ops.cxx

+    for(int di = 0; di < ndet; di++) {
+        int ioff = di*nsamp;
+        for(int si = 0; si < nsamp; si++) {
+            int i = ioff + si;


The contents of this loop are not implemented correctly, to isolate detectors. For example, you set idx=0 and then refer to tod_data[idx]. That's always the first detector's first sample.

There are patterns to reduce the risk of out-of-bounds accesses. In this case, you should make pointers for just this detector's data and output (T* det_data = tod_data + ioff; T* det_out = output + ioff;) and then only use si and quantities based on si to access det_data and det_out, in the loop.

Better yet, use T* det_data = tod_buf->ptr_1d(di);, and similarly for det_out. Then you never need tod_data at all. And you can relax the requirements on strides[0], because ptr_1d has handled that for you!

Would I be asking you to do all this, if this inner loop were right already? No, probably not. Lesson learned I hope. :)

Ah yes I see I should have done idx = ioff below to do what I want, good catch.

Yeah I can change things to grab a 1d pointer to make that better.

mhasself · 2024-06-26T14:06:33Z

src/array_ops.cxx

+            "  win_size: size of window to use as buffer when differencing\n"
+            "  scale: the scale of jumps to look for\n");
+    bp::def("scale_jumps64", scale_jumps<double>,
+            "scale_jumps64(tod, out, bsize)\n"


signature doesn't have the right args.

mhasself · 2024-06-26T14:35:52Z

src/array_ops.cxx

+            int i = ioff + si;
+            buffer[i] = flag_data[i];
+            int comp = 0;
+            if(i>=width) {


This is incorrect -- see my long comment in "scale_jumps". Solution is the same.

mhasself · 2024-06-26T14:49:02Z

src/array_ops.cxx

+    int bsize = stop - start;
+    // Could replace the loops with boost accumulators?
+    T center = 0.0;
+    if(central == 1 | moment == 1) {


Given the types, this should be if (central || moment == 1).

skhrg · 2024-06-26T21:51:30Z

On the sensitivity of the jumpfinder caring about the location relative to the edge of the window. This is known with calculalable effects. The shifted window scheme here was chosen in such a way that there is some garunteed minumum sensitivity. See https://simonsobs.atlassian.net/wiki/spaces/PRO/pages/389939204/Matched+Filter+Jump+Finding for an explanation (page is slightly out of date but the math for this is still accurate).

A boxcar would indeed give us maximum consistency and sensitivity since you will always have a case when the jump is in the center of the kernal which maximizes the sensitivity. But I dont think that is necessary since we pad anyways.

Also the jumps getting caught with min size greater than the jump size I beleive is becuase I correct for a samples worth of slop in min size. That can be removed, I dont recall the exact edge case I was worried about when I added that.

skhrg · 2024-07-03T21:27:00Z

Ah actually, the correct thing to do is to determine the sensitivity as a function of distance from the window edge and compare to min_size*sensitivity . What I do now is use the sensitivity at the minimum gaurunteed distance from the edge given the phased block approach.

I'll work out the math but I think with that (and adding a cutoff where we just skip and don't bother checking since the other block will always do better) we should get a consistent and well behaved kernel.

While yes I know that taking the continuous limit solves this problem as well, one of the reasons for going with the phased blocks like I did here was to reduce the computational complexity since making it continuous will make it block_size/2 times slower.

@mhasself let me know if that sounds sane.

mhasself · 2024-07-11T15:45:49Z

src/array_ops.cxx

@@ -624,26 +624,26 @@ void block_moment(const bp::object & tod, const bp::object & out, int bsize, int

 void _clean_flag(int* flag_data, int width, int ndet, int nsamp)
 {
-    int* buffer = new int[ndet * nsamp];
+    int* buffer = new int[nsamp];


You need an independent buffer per thread. This I will insist on.

src/array_ops.cxx

mhasself · 2024-07-11T20:07:02Z

While yes I know that taking the continuous limit solves this problem as well, one of the reasons for going with the phased blocks like I did here was to reduce the computational complexity since making it continuous will make it block_size/2 times slower.

I am skeptical that you'd suffer a performance cost of this magnitude. But I admit I haven't thought about it super carefully.

@mhasself let me know if that sounds sane.

Let's go with this for now, sure, but I will welcome something with less structured behavior in the future...

skhrg · 2024-07-16T16:17:55Z

Behavior is more consistent now. If you turn off samp_correction then it gives you exactly what I calculate with pen and paper. Leaving that on for now but I could be convinced its unneeded paranoia.

mhasself

Minor comments. Then rebase to make it mergeable and prepare for a final review...

src/array_ops.cxx

mhasself · 2024-07-16T18:22:29Z

src/array_ops.cxx

+                if(si==12){
+                    std::cout << thresh << "\t" << std::flush;
+                }


Remove debug

skhrg · 2024-07-16T22:21:24Z

Sounds good. Any opposition to me squashing the whole PR into one commit?

mhasself · 2024-07-17T13:41:58Z

Sounds good. Any opposition to me squashing the whole PR into one commit?

Rebase + squash into one commit would be perfect. Thanks!

* Matched filter jump finder * Quantized size jump finder * Compute n'th moment is blocks * Clean flag by removing small segments

mhasself

Thanks!

skhrg requested a review from mhasself May 20, 2024 16:57

skhrg force-pushed the block_moment branch from 23f5270 to 22044b0 Compare May 23, 2024 21:42

skhrg changed the title ~~Block moment~~ Porting over jump finding stuff May 23, 2024

skhrg requested a review from tskisner May 23, 2024 21:48

skhrg force-pushed the block_moment branch from 22044b0 to 86b0f22 Compare May 24, 2024 17:09

mhasself requested changes Jun 3, 2024

View reviewed changes

skhrg requested a review from mhasself June 9, 2024 00:06

skhrg force-pushed the block_moment branch 2 times, most recently from e0642d3 to 382966e Compare June 26, 2024 12:34

mhasself requested changes Jun 26, 2024

View reviewed changes

skhrg force-pushed the block_moment branch from 48ea835 to 73370cc Compare July 4, 2024 02:04

mhasself requested changes Jul 11, 2024

View reviewed changes

skhrg requested a review from mhasself July 16, 2024 16:15

skhrg force-pushed the block_moment branch from 43753a8 to c54abfa Compare July 16, 2024 16:42

mhasself requested changes Jul 16, 2024

View reviewed changes

skhrg force-pushed the block_moment branch from c54abfa to f902253 Compare July 17, 2024 21:06

feat: tools for jump finding and some generically useful helpers

4fd920d

* Matched filter jump finder * Quantized size jump finder * Compute n'th moment is blocks * Clean flag by removing small segments

skhrg force-pushed the block_moment branch from f902253 to 4fd920d Compare July 17, 2024 21:40

mhasself approved these changes Jul 18, 2024

View reviewed changes

mhasself merged commit 34781f0 into master Jul 18, 2024
4 checks passed

mhasself deleted the block_moment branch July 18, 2024 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Porting over jump finding stuff #180

Porting over jump finding stuff #180

skhrg commented May 20, 2024 •

edited

Loading

mhasself left a comment

skhrg commented Jun 26, 2024

mhasself left a comment

mhasself Jun 26, 2024

mhasself Jun 26, 2024

mhasself Jun 26, 2024

mhasself Jun 26, 2024

skhrg Jun 26, 2024

mhasself Jun 26, 2024

mhasself Jun 26, 2024

mhasself Jun 26, 2024

skhrg commented Jun 26, 2024

skhrg commented Jul 3, 2024 •

edited

Loading

mhasself Jul 11, 2024

mhasself commented Jul 11, 2024

skhrg commented Jul 16, 2024

mhasself left a comment

mhasself Jul 16, 2024

skhrg commented Jul 16, 2024

mhasself commented Jul 17, 2024

mhasself left a comment

		" atol: how close to the multiple of scale a value needs to be to be a jump\n"
		" should be an array (float32) with shape (ndet,)\n"

Porting over jump finding stuff #180

Porting over jump finding stuff #180

Conversation

skhrg commented May 20, 2024 • edited Loading

mhasself left a comment

Choose a reason for hiding this comment

skhrg commented Jun 26, 2024

mhasself left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skhrg commented Jun 26, 2024

skhrg commented Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

mhasself commented Jul 11, 2024

skhrg commented Jul 16, 2024

mhasself left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skhrg commented Jul 16, 2024

mhasself commented Jul 17, 2024

mhasself left a comment

Choose a reason for hiding this comment

skhrg commented May 20, 2024 •

edited

Loading

skhrg commented Jul 3, 2024 •

edited

Loading