-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-45484: Initial commit of pipetask to run RAIL p(z) estimation stages. #1
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly fine, seeing as it has been tested, but there are a few potential issues that are best sorted out now before the tasks start getting used regularly.
# cls.estimator_class, cls.estimator_module | ||
# ) | ||
stage_class = cls.estimator_class | ||
for key, val in stage_class.config_options.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamically adding fields like this is going to make it a bit difficult to configure this in pipelines. They'll need remain easily default-constructed so users can read the list of fields and docs.
Another issue that comes to mind is that the config fields may change with different versions of Ceci, which may lead to opaque errors when persisting to the Butler. Perhaps @TallJimbo has thoughts on this. Of course update we do update config classes in the pipelines but it's easier to keep versioning consistent with stack packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think that getting the param from rail is better than duplicating them. I guess if @TallJimbo could give some input on the constraints as to what happens when we update config classes that would help understand how we manage this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the model will have to be that the whoever is curating the photo-z selection algorithms provides the correct settings / parameters, i.e., they basically manage the pipeline files. If it is helpful to add some doc pointing to rail so that people can find the underlying code / parameters / behavior, that is pretty easy to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think disruption from changing config definitions will be tolerable here; I think the worst-case scenario is that we'd have to render some old configs in old processing runs unreadable.
Once we settle on a production photo-z algorithm I think it may be worthwhile to duplicate the configuration for that one out explicitly, but we can live with this in the meantime.
tests/pz_pipeline_hsc.yaml
Outdated
Photo-z madness | ||
tasks: | ||
pz_trainz: | ||
class: lsst.meas.pz.estimate_pz_task.EstimatePZTask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fairly common practice to make convenient subclasses of tasks that override setDefaults
with everything you've put into the Python block. You may end up needing to do that if you want to set obs package overrides anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I see a number of cases of overriding setDefaults in config classes. Are you saying that I should make additional Config / Pipetask class pairs for each algorithm? So that each algorithm has four classes: a config / task pair to do the algorithm and config Pipetask pair to select the particular Task. is that correct?
tests/pz_pipeline_hsc.yaml
Outdated
python: | | ||
from lsst.meas.pz.estimate_pz_task_trainz import EstimatePZTrainZTask | ||
config.pz_algo.retarget(EstimatePZTrainZTask) | ||
config.pz_algo.stage_name='trainz' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI you can put the string and other plain old data overrides outside of the python block since the python block always runs first. It's just the import and retarget call that need to go here.
return butler | ||
|
||
def test_hsc_pz_pipeline(self): | ||
butler = self.makeButler(writeable=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks fine to me assuming it runs, but if you haven't already you may want to ask someone with more pipeline unit test expertise if you get to the stage of moving this repo to lsst
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks.
tests/test_estimate_pz_task.py
Outdated
butler = Butler( | ||
"/repo/dc2", | ||
collections=[ | ||
"2.2i/runs/test-med-1/w_2024_16/DM-43972/step3/group1/w00_000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's neat that you can run unit tests against existing repo collections but this makes me very nervous. We're not planning to remove collections any time soon but there's a variety of ways this can either break or run very slowly. @TallJimbo , any thoughts on this?
Either way, the collection should just be "2.2i/runs/test-med-1/w_2024_16/DM-43972"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only runs if you explicitly run py.test at s3df. We could imagine having a specific set of test collections or a test repo at s3df to avoid these sorts of issues. For now it is just nice to be able to do this as part of running py.test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also worth noting that I think this doesn't put anything into the DB, so we can run it multiple times safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See individual comments.
self.pzModel_dimension_group, | ||
("HSC",), | ||
), | ||
run="u/testing/pz_models", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @TallJimbo on whether there's an existing precedent for writing unit tests to a testing
user collection (and if so, what the username is).
Also, while in practice we don't usually share CI repos between users, it's certainly something you can do and which might break this test if users don't clean up after building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I put in the cleanup on success. Happy to just write stuff to the u/$USER.. or u/$USER/pz_testing collection if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be some precedent in @mfisherlevine's summit package tests.
No description provided.