DM-45484: Initial commit of pipetask to run RAIL p(z) estimation stages. #1

eacharles · 2024-07-30T17:37:52Z

No description provided.

taranu

This is mostly fine, seeing as it has been tested, but there are a few potential issues that are best sorted out now before the tasks start getting used regularly.

python/lsst/meas/pz/estimate_pz_task.py

taranu · 2024-09-06T19:10:05Z

python/lsst/meas/pz/estimate_pz_task.py

+        #    cls.estimator_class, cls.estimator_module
+        # )
+        stage_class = cls.estimator_class
+        for key, val in stage_class.config_options.items():


Dynamically adding fields like this is going to make it a bit difficult to configure this in pipelines. They'll need remain easily default-constructed so users can read the list of fields and docs.

Another issue that comes to mind is that the config fields may change with different versions of Ceci, which may lead to opaque errors when persisting to the Butler. Perhaps @TallJimbo has thoughts on this. Of course update we do update config classes in the pipelines but it's easier to keep versioning consistent with stack packages.

Yes. I think that getting the param from rail is better than duplicating them. I guess if @TallJimbo could give some input on the constraints as to what happens when we update config classes that would help understand how we manage this.

I think that the model will have to be that the whoever is curating the photo-z selection algorithms provides the correct settings / parameters, i.e., they basically manage the pipeline files. If it is helpful to add some doc pointing to rail so that people can find the underlying code / parameters / behavior, that is pretty easy to do.

I think disruption from changing config definitions will be tolerable here; I think the worst-case scenario is that we'd have to render some old configs in old processing runs unreadable.

Once we settle on a production photo-z algorithm I think it may be worthwhile to duplicate the configuration for that one out explicitly, but we can live with this in the meantime.

taranu · 2024-09-06T19:48:19Z

tests/pz_pipeline_hsc.yaml

+  Photo-z madness
+tasks:
+  pz_trainz:
+    class: lsst.meas.pz.estimate_pz_task.EstimatePZTask


It's fairly common practice to make convenient subclasses of tasks that override setDefaults with everything you've put into the Python block. You may end up needing to do that if you want to set obs package overrides anyway.

So, I see a number of cases of overriding setDefaults in config classes. Are you saying that I should make additional Config / Pipetask class pairs for each algorithm? So that each algorithm has four classes: a config / task pair to do the algorithm and config Pipetask pair to select the particular Task. is that correct?

taranu · 2024-09-06T19:49:21Z

tests/pz_pipeline_hsc.yaml

+      python: |
+         from lsst.meas.pz.estimate_pz_task_trainz import EstimatePZTrainZTask
+         config.pz_algo.retarget(EstimatePZTrainZTask)
+         config.pz_algo.stage_name='trainz'


FYI you can put the string and other plain old data overrides outside of the python block since the python block always runs first. It's just the import and retarget call that need to go here.

taranu · 2024-09-06T19:52:04Z

tests/test_estimate_pz_pipeline.py

+        return butler
+
+    def test_hsc_pz_pipeline(self):
+        butler = self.makeButler(writeable=True)


This all looks fine to me assuming it runs, but if you haven't already you may want to ask someone with more pipeline unit test expertise if you get to the stage of moving this repo to lsst.

Ok, thanks.

taranu · 2024-09-06T19:59:59Z

tests/test_estimate_pz_task.py

+        butler = Butler(
+            "/repo/dc2",
+            collections=[
+                "2.2i/runs/test-med-1/w_2024_16/DM-43972/step3/group1/w00_000"


It's neat that you can run unit tests against existing repo collections but this makes me very nervous. We're not planning to remove collections any time soon but there's a variety of ways this can either break or run very slowly. @TallJimbo , any thoughts on this?

Either way, the collection should just be "2.2i/runs/test-med-1/w_2024_16/DM-43972".

This only runs if you explicitly run py.test at s3df. We could imagine having a specific set of test collections or a test repo at s3df to avoid these sorts of issues. For now it is just nice to be able to do this as part of running py.test

Also worth noting that I think this doesn't put anything into the DB, so we can run it multiple times safely.

tests/test_estimate_pz_task.py

…ePZTask

taranu

See individual comments.

python/lsst/meas/pz/estimate_pz_task.py

tests/data/pz_pipeline_hsc.yaml

tests/test_estimate_pz_task.py

taranu · 2024-09-24T00:24:03Z

tests/test_estimate_pz_task.py

+                self.pzModel_dimension_group,
+                ("HSC",),
+            ),
+            run="u/testing/pz_models",


Pinging @TallJimbo on whether there's an existing precedent for writing unit tests to a testing user collection (and if so, what the username is).

Also, while in practice we don't usually share CI repos between users, it's certainly something you can do and which might break this test if users don't clean up after building.

Ok, I put in the cleanup on success. Happy to just write stuff to the u/$USER.. or u/$USER/pz_testing collection if you prefer.

I think there might be some precedent in @mfisherlevine's summit package tests.

python/lsst/meas/pz/estimate_pz_task.py

eacharles and others added 9 commits July 29, 2024 14:22

Added python/lsst/meas/pz/estimate_pz_task.py

b78181c

Working version of estimate_pz_task.py

b633d81

Fix typo in estimate_pz_task.py

a7adfe0

Deflaking

6a80313

added more docstrings

3f85ca9

Fix up docstring

e620dd0

fix up docstrings

f235622

fix up __all__ in estimate_pz_task.py

e9aed01

fix imports and run isort

75460b0

yalsayyad changed the title ~~Tickets/dm 45484~~ DM-45484: Jul 30, 2024

eacharles changed the title ~~DM-45484:~~ DM-45484: Initial commit of pipetask to run RAIL p(z) estimation stages. Jul 30, 2024

eacharles added 17 commits July 31, 2024 10:55

Swtich to explicitly using and import RAIL class

90b2ea7

Switch to using sub-tasks

460ddde

Added qp_formatter

f882c60

Fix connection types

3cc9676

simplify qp_formatter to remove redundant check

08fefdf

Remove other redundant check from QPFormatter

cb09cc1

Moved knn and trainz specific stuff to their own files

b550ead

Switch to custom class for PZModel and add ModelFormatter

dca93cd

tweaking estimate_pz_task_trainz.py and running black & isort

c711041

adding config options to deal with bands

57eec34

running black & flake8

cde1ab4

Fixes from writing unit tests

47a8ade

Added unit tests and related data files

40b9f25

Added unit tests using /repo/dc2

2d0edd9

Set default bands to ugrizy

9dda8b1

Added dereddening

34cad68

Clean up parameters to remove redundant ones

ff6e598

taranu requested changes Sep 6, 2024

View reviewed changes

WIP, simpler requested changes

e10dec3

eacharles and others added 10 commits September 6, 2024 16:41

switch to making estimator_class a classmethod

5b31127

WIP move test data to tests/data

50519d2

WIP, moving functionality from run to runQuantum

781d45a

Fix collection name in s3df test

b0355af

WIP, deliting

0e50a78

WIP, fixes to ci testing using tests/data directory

607e648

WIP, Fix paths for testing

80c94b7

WIP, fix runQuantum method in EstimatePZTask

f2222f3

Remove spurious print statement and add _initizalized flag to Estimat…

bad9322

…ePZTask

Switch to using ArrowAstropy instead of DataFrame

d185550

eacharles requested a review from taranu September 17, 2024 19:31

eacharles added 2 commits September 20, 2024 19:56

Fixes for unit tests

1ca7b57

Whitespace and linting

c8ede51

taranu reviewed Sep 24, 2024

View reviewed changes

eacharles added 5 commits September 23, 2024 18:42

Moved script to tests/cleanup.sh and made it executable

6c8b201

Added check on return code, and run cleanup.sh script on sucess

e59bb9f

remove spurious comments from pipeline file

65bab23

standardize astropy table import and remove dead comments

b3fe18e

removed moved script

fb5c8f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-45484: Initial commit of pipetask to run RAIL p(z) estimation stages. #1

DM-45484: Initial commit of pipetask to run RAIL p(z) estimation stages. #1

eacharles commented Jul 30, 2024

taranu left a comment

taranu Sep 6, 2024

eacharles Sep 6, 2024

eacharles Sep 12, 2024

TallJimbo Sep 24, 2024

taranu Sep 6, 2024

eacharles Sep 17, 2024

taranu Sep 6, 2024

taranu Sep 6, 2024

eacharles Sep 12, 2024

taranu Sep 6, 2024

eacharles Sep 12, 2024

eacharles Sep 17, 2024

taranu left a comment

taranu Sep 24, 2024

eacharles Sep 24, 2024

TallJimbo Sep 24, 2024

DM-45484: Initial commit of pipetask to run RAIL p(z) estimation stages. #1

Are you sure you want to change the base?

DM-45484: Initial commit of pipetask to run RAIL p(z) estimation stages. #1

Conversation

eacharles commented Jul 30, 2024

taranu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

taranu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment