Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-45484: Initial commit of pipetask to run RAIL p(z) estimation stages. #1

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

eacharles
Copy link
Collaborator

No description provided.

@yalsayyad yalsayyad changed the title Tickets/dm 45484 DM-45484: Jul 30, 2024
@eacharles eacharles changed the title DM-45484: DM-45484: Initial commit of pipetask to run RAIL p(z) estimation stages. Jul 30, 2024
Copy link

@taranu taranu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly fine, seeing as it has been tested, but there are a few potential issues that are best sorted out now before the tasks start getting used regularly.

python/lsst/meas/pz/estimate_pz_task.py Outdated Show resolved Hide resolved
python/lsst/meas/pz/estimate_pz_task.py Outdated Show resolved Hide resolved
python/lsst/meas/pz/estimate_pz_task.py Outdated Show resolved Hide resolved
python/lsst/meas/pz/estimate_pz_task.py Outdated Show resolved Hide resolved
# cls.estimator_class, cls.estimator_module
# )
stage_class = cls.estimator_class
for key, val in stage_class.config_options.items():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamically adding fields like this is going to make it a bit difficult to configure this in pipelines. They'll need remain easily default-constructed so users can read the list of fields and docs.

Another issue that comes to mind is that the config fields may change with different versions of Ceci, which may lead to opaque errors when persisting to the Butler. Perhaps @TallJimbo has thoughts on this. Of course update we do update config classes in the pipelines but it's easier to keep versioning consistent with stack packages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think that getting the param from rail is better than duplicating them. I guess if @TallJimbo could give some input on the constraints as to what happens when we update config classes that would help understand how we manage this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the model will have to be that the whoever is curating the photo-z selection algorithms provides the correct settings / parameters, i.e., they basically manage the pipeline files. If it is helpful to add some doc pointing to rail so that people can find the underlying code / parameters / behavior, that is pretty easy to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think disruption from changing config definitions will be tolerable here; I think the worst-case scenario is that we'd have to render some old configs in old processing runs unreadable.

Once we settle on a production photo-z algorithm I think it may be worthwhile to duplicate the configuration for that one out explicitly, but we can live with this in the meantime.

Photo-z madness
tasks:
pz_trainz:
class: lsst.meas.pz.estimate_pz_task.EstimatePZTask
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fairly common practice to make convenient subclasses of tasks that override setDefaults with everything you've put into the Python block. You may end up needing to do that if you want to set obs package overrides anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I see a number of cases of overriding setDefaults in config classes. Are you saying that I should make additional Config / Pipetask class pairs for each algorithm? So that each algorithm has four classes: a config / task pair to do the algorithm and config Pipetask pair to select the particular Task. is that correct?

python: |
from lsst.meas.pz.estimate_pz_task_trainz import EstimatePZTrainZTask
config.pz_algo.retarget(EstimatePZTrainZTask)
config.pz_algo.stage_name='trainz'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you can put the string and other plain old data overrides outside of the python block since the python block always runs first. It's just the import and retarget call that need to go here.

return butler

def test_hsc_pz_pipeline(self):
butler = self.makeButler(writeable=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks fine to me assuming it runs, but if you haven't already you may want to ask someone with more pipeline unit test expertise if you get to the stage of moving this repo to lsst.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks.

butler = Butler(
"/repo/dc2",
collections=[
"2.2i/runs/test-med-1/w_2024_16/DM-43972/step3/group1/w00_000"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's neat that you can run unit tests against existing repo collections but this makes me very nervous. We're not planning to remove collections any time soon but there's a variety of ways this can either break or run very slowly. @TallJimbo , any thoughts on this?

Either way, the collection should just be "2.2i/runs/test-med-1/w_2024_16/DM-43972".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only runs if you explicitly run py.test at s3df. We could imagine having a specific set of test collections or a test repo at s3df to avoid these sorts of issues. For now it is just nice to be able to do this as part of running py.test

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth noting that I think this doesn't put anything into the DB, so we can run it multiple times safely.

tests/test_estimate_pz_task.py Show resolved Hide resolved
Copy link

@taranu taranu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See individual comments.

python/lsst/meas/pz/estimate_pz_task.py Outdated Show resolved Hide resolved
python/lsst/meas/pz/estimate_pz_task.py Outdated Show resolved Hide resolved
tests/data/pz_pipeline_hsc.yaml Outdated Show resolved Hide resolved
tests/data/pz_pipeline_hsc.yaml Outdated Show resolved Hide resolved
tests/test_estimate_pz_task.py Outdated Show resolved Hide resolved
tests/test_estimate_pz_task.py Show resolved Hide resolved
self.pzModel_dimension_group,
("HSC",),
),
run="u/testing/pz_models",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @TallJimbo on whether there's an existing precedent for writing unit tests to a testing user collection (and if so, what the username is).

Also, while in practice we don't usually share CI repos between users, it's certainly something you can do and which might break this test if users don't clean up after building.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I put in the cleanup on success. Happy to just write stuff to the u/$USER.. or u/$USER/pz_testing collection if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be some precedent in @mfisherlevine's summit package tests.

python/lsst/meas/pz/estimate_pz_task.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants