-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reconstitute interleaving features #886
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #886 +/- ##
==========================================
+ Coverage 97.14% 97.18% +0.04%
==========================================
Files 21 21
Lines 9354 9392 +38
==========================================
+ Hits 9087 9128 +41
+ Misses 267 264 -3
... and 5 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. Just a few questions and clarifications.
@@ -2023,7 +2022,9 @@ def time_chunk_from_baseline_chunks(time_chunk_template, baseline_chunk_files, o | |||
output will trim the extra frequencies in the time_chunk and write out trimmed freqs. The same is true | |||
for polarizations. | |||
baseline_chunk_files : list of strings | |||
list of paths to baseline-chunk files to select time-chunk file from. | |||
list of paths to baseline-chunk files to select time-chunk file from. If the files have "_interleave" in their title then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this any concern that this will be triggered accidentally? Do we want to have some kind of boolean trigger (default false) for this behavior or do you think that's too rare of an edge case to be worried about?
"interleave" in fname and not interleave_mode: | ||
raise ValueError("must not have a subset of files with 'interleave' in name.") | ||
if interleave_mode: | ||
interleave_indices = np.unique([int(re.findall("interleave_[0-9]{1,10}", fname)[0][11:]) for fname in baseline_chunk_files]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels rather hard-coded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At minimum, the precise expectation for the file name format should be documented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I actually think a much more flexible and future-proof way of doing this would be to have a new argument to the function, like interleave_regex
, which if provided, will do the above (it could be the above by default, but probably better to define it with a capture group instead).
"interleave" in fname and not interleave_mode: | ||
raise ValueError("must not have a subset of files with 'interleave' in name.") | ||
if interleave_mode: | ||
interleave_indices = np.unique([int(re.findall("interleave_[0-9]{1,10}", fname)[0][11:]) for fname in baseline_chunk_files]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a function that should probably be broken out for future use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aewallwi -- I have a few comments that I think will make this a little more future-robust, but otherwise it seems good.
tmp_path = tmpdir.strpath | ||
cdir = tmp_path + "/cache_temp" | ||
os.mkdir(cdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In new code, I'd heartily suggest using the pathlib
module for path operations. You can use the pytest fixture tmp_path_factory
, then do cdir = tmp_path_factory.mkdir('test_time_chunk_from_baseline_chunks')
.
# reconstitute the filtered data | ||
for filenum, file in enumerate(datafiles): | ||
# reconstitute | ||
# AEW -- 5-10-2023 -- I AM HERE! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know!
@@ -794,8 +794,7 @@ def vis_clean(self, keys=None, x=None, data=None, flags=None, wgts=None, | |||
# get filter properties | |||
mfrate = max_frate[k] if max_frate is not None else None | |||
filter_centers, filter_half_widths = gen_filter_properties(ax=ax, horizon=horizon, | |||
standoff=standoff, min_dly=min_dly, | |||
bl_len=self.bllens[k[:2]], max_frate=mfrate) | |||
standoff=standoff, min_dly=min_dly, bl_len=self.bllens[k[:2]], max_frate=mfrate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell what's happening on this line, but it seems weird
list of paths to baseline-chunk files to select time-chunk file from. | ||
list of paths to baseline-chunk files to select time-chunk file from. If the files have "_interleave" in their title then | ||
the method will automatically identify the number of unique interleaves, chunk the file list up into interleaved sets and | ||
retrieve integrations based on the time in the first file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, just having read this docstring, I'm still not clear on exactly what's happening here. Can you maybe provide a more clear example in the docstring?
@@ -2040,11 +2041,38 @@ def time_chunk_from_baseline_chunks(time_chunk_template, baseline_chunk_files, o | |||
------- | |||
Nothing | |||
""" | |||
# check whether "interleave" is in baseline chunk filenames. If so, make sure that they all have "interleave", | |||
# split them into sets, and make sure that the sets all have the same number of files. | |||
interleave_mode = "interleave" in baseline_chunk_files[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not consistent with the docstring, which states that if "_interleave" is in the filename, it'll do the interleave mode
for fname in baseline_chunk_files: | ||
if "interleave" not in fname and interleave_mode or \ | ||
"interleave" in fname and not interleave_mode: | ||
raise ValueError("must not have a subset of files with 'interleave' in name.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a nicer error here would be "Cannot have some baseline_chunk_files with "_interleave" in their name, while other's don't"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, add this info to the docstring (i.e. say that all or none of the files must have "_interleave"
in the name)
"interleave" in fname and not interleave_mode: | ||
raise ValueError("must not have a subset of files with 'interleave' in name.") | ||
if interleave_mode: | ||
interleave_indices = np.unique([int(re.findall("interleave_[0-9]{1,10}", fname)[0][11:]) for fname in baseline_chunk_files]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I actually think a much more flexible and future-proof way of doing this would be to have a new argument to the function, like interleave_regex
, which if provided, will do the above (it could be the above by default, but probably better to define it with a capture group instead).
Modify
vis_clean.time_chunk_from_baseline_chunks
code so that all interleaves will fall into the same final labeled time bin in makeflow which are set by the LSTs everystride_length
. Without this fix, the time averages for different "interleaves" can fall into different makeflow file indices when we undo the cornerturn after performing time averaging.