Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-45484: Initial commit of pipetask to run RAIL p(z) estimation stages. #1

Merged
merged 47 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
b78181c
Added python/lsst/meas/pz/estimate_pz_task.py
eacharles Jul 29, 2024
b633d81
Working version of estimate_pz_task.py
eacharles Jul 30, 2024
a7adfe0
Fix typo in estimate_pz_task.py
eacharles Jul 30, 2024
6a80313
Deflaking
eacharles Jul 30, 2024
3f85ca9
added more docstrings
eacharles Jul 30, 2024
e620dd0
Fix up docstring
eacharles Jul 30, 2024
f235622
fix up docstrings
eacharles Jul 30, 2024
e9aed01
fix up __all__ in estimate_pz_task.py
eacharles Jul 30, 2024
75460b0
fix imports and run isort
eacharles Jul 30, 2024
90b2ea7
Swtich to explicitly using and import RAIL class
eacharles Jul 31, 2024
460ddde
Switch to using sub-tasks
eacharles Aug 1, 2024
f882c60
Added qp_formatter
eacharles Aug 1, 2024
3cc9676
Fix connection types
eacharles Aug 1, 2024
08fefdf
simplify qp_formatter to remove redundant check
eacharles Aug 1, 2024
cb09cc1
Remove other redundant check from QPFormatter
eacharles Aug 1, 2024
b550ead
Moved knn and trainz specific stuff to their own files
eacharles Aug 1, 2024
dca93cd
Switch to custom class for PZModel and add ModelFormatter
eacharles Aug 3, 2024
c711041
tweaking estimate_pz_task_trainz.py and running black & isort
eacharles Aug 5, 2024
57eec34
adding config options to deal with bands
eacharles Aug 5, 2024
cde1ab4
running black & flake8
eacharles Aug 5, 2024
47a8ade
Fixes from writing unit tests
eacharles Aug 8, 2024
40b9f25
Added unit tests and related data files
eacharles Aug 8, 2024
2d0edd9
Added unit tests using /repo/dc2
eacharles Aug 8, 2024
9dda8b1
Set default bands to ugrizy
eacharles Aug 8, 2024
34cad68
Added dereddening
eacharles Aug 14, 2024
ff6e598
Clean up parameters to remove redundant ones
eacharles Aug 15, 2024
e10dec3
WIP, simpler requested changes
eacharles Sep 6, 2024
5b31127
switch to making estimator_class a classmethod
eacharles Sep 6, 2024
50519d2
WIP move test data to tests/data
eacharles Sep 12, 2024
781d45a
WIP, moving functionality from run to runQuantum
eacharles Sep 12, 2024
b0355af
Fix collection name in s3df test
eacharles Sep 12, 2024
0e50a78
WIP, deliting
eacharles Sep 12, 2024
607e648
WIP, fixes to ci testing using tests/data directory
eacharles Sep 12, 2024
80c94b7
WIP, Fix paths for testing
eacharles Sep 12, 2024
f2222f3
WIP, fix runQuantum method in EstimatePZTask
eacharles Sep 12, 2024
bad9322
Remove spurious print statement and add _initizalized flag to Estimat…
eacharles Sep 12, 2024
d185550
Switch to using ArrowAstropy instead of DataFrame
eacharles Sep 17, 2024
1ca7b57
Fixes for unit tests
eacharles Sep 21, 2024
c8ede51
Whitespace and linting
eacharles Sep 21, 2024
6c8b201
Moved script to tests/cleanup.sh and made it executable
eacharles Sep 24, 2024
e59bb9f
Added check on return code, and run cleanup.sh script on sucess
eacharles Sep 24, 2024
65bab23
remove spurious comments from pipeline file
eacharles Sep 24, 2024
b3fe18e
standardize astropy table import and remove dead comments
eacharles Sep 24, 2024
fb5c8f2
removed moved script
eacharles Sep 24, 2024
e2fd9e6
Switch to using u/echarles collections for testing
eacharles Sep 24, 2024
5d6b86d
Evaluate echarles envvar and properly clean up
eacharles Sep 24, 2024
c6cf708
remove tests/test_script
eacharles Sep 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
481 changes: 481 additions & 0 deletions python/lsst/meas/pz/estimate_pz_task.py

Large diffs are not rendered by default.

60 changes: 60 additions & 0 deletions python/lsst/meas/pz/estimate_pz_task_knn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# This file is part of meas_pz.
#
# Developed for the LSST Data Management System.
# This product includes software developed by the LSST Project
# (https://www.lsst.org).
# See the COPYRIGHT file at the top-level directory of this distribution
# for details of code ownership.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.

__all__ = [
"EstimatePZKNNTask",
"EstimatePZKNNConfig",
]


from rail.estimation.algos.k_nearneigh import KNearNeighEstimator

from .estimate_pz_task import EstimatePZAlgoConfigBase, EstimatePZAlgoTask


class EstimatePZKNNConfig(EstimatePZAlgoConfigBase):
"""Config for EstimatePZKNNTask

This will select and comnfigure the KNearNeighEstimator p(z)
estimation algorithm

See https://github.com/LSSTDESC/rail_sklearn/blob/main/src/rail/estimation/algos/k_nearneigh.py # noqa
for parameters and default values.
"""

estimator_class = KNearNeighEstimator


EstimatePZKNNConfig._make_fields()


class EstimatePZKNNTask(EstimatePZAlgoTask):
"""SubTask that runs RAIL KNN algorithm for p(z) estimation

See https://github.com/LSSTDESC/rail_sklearn/blob/main/src/rail/estimation/algos/k_nearneigh.py # noqa
for algorithm implementation.

KNN estimates the p(z) distribution by taking
a weighted mixture of the nearest neigheboors in
color space.
"""

ConfigClass = EstimatePZKNNConfig
eacharles marked this conversation as resolved.
Show resolved Hide resolved
80 changes: 80 additions & 0 deletions python/lsst/meas/pz/estimate_pz_task_trainz.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# This file is part of meas_pz.
#
# Developed for the LSST Data Management System.
# This product includes software developed by the LSST Project
# (https://www.lsst.org).
# See the COPYRIGHT file at the top-level directory of this distribution
# for details of code ownership.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.

__all__ = [
"EstimatePZTrainZTask",
"EstimatePZTrainZConfig",
]

import numpy as np
from pandas import DataFrame
from rail.estimation.algos.train_z import TrainZEstimator

from .estimate_pz_task import EstimatePZAlgoConfigBase, EstimatePZAlgoTask


class EstimatePZTrainZConfig(EstimatePZAlgoConfigBase):
"""Config for EstimatePZTrainZTask

This will select and comnfigure the TrainZEsimator p(z)
estimation algorithm

See https://github.com/LSSTDESC/rail_base/blob/main/src/rail/estimation/algos/train_z.py # noqa
for parameters and default values.
"""

estimator_class = TrainZEstimator


EstimatePZTrainZConfig._make_fields()


class EstimatePZTrainZTask(EstimatePZAlgoTask):
"""SubTask that runs RAIL TrainZ algorithm for p(z) estimation

See https://github.com/LSSTDESC/rail_base/blob/main/src/rail/estimation/algos/train_z.py # noqa
for algorithm implementation.

TrainZ is just a placeholder algorithm that assigns that same
p(z) distribution (taken from the input model file) to every object.
"""

ConfigClass = EstimatePZTrainZConfig
eacharles marked this conversation as resolved.
Show resolved Hide resolved

def _get_mags_and_errs(
self,
fluxes: DataFrame,
mag_offset: float,
) -> dict[str, np.array]:

flux_names = self._get_flux_names()
mag_names = self._get_mag_names()

mag_dict = {}
# loop over bands, make mags and mag errors and fill dict
for band in flux_names.keys():
eacharles marked this conversation as resolved.
Show resolved Hide resolved
fluxVals = fluxes[flux_names[band]]
mag_dict[mag_names[band]] = self._flux_to_mag(
fluxVals,
mag_offset,
99.0,
)
return mag_dict
63 changes: 63 additions & 0 deletions python/lsst/meas/pz/model_formatter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# This file is part of meaz_pz
#
# Developed for the LSST Data Management System.
# This product includes software developed by the LSST Project
# (http://www.lsst.org).
# See the COPYRIGHT file at the top-level directory of this distribution
# for details of code ownership.
#
# This software is dual licensed under the GNU General Public License and also
# under a 3-clause BSD license. Recipients may choose which of these licenses
# to use; please see the files gpl-3.0.txt and/or bsd_license.txt,
# respectively. If you choose the GPL option then the following text applies
# (but note that there is still no warranty even if you opt for BSD instead):
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

__all__ = ("ModelFormatter",)

from typing import Any

from rail.core.model import Model as RailModel
from lsst.daf.butler import FormatterV2
from lsst.resources import ResourcePath


class ModelFormatter(FormatterV2):
"""Read and write `rail.core.model.Model` objects.

Currently assumes only local file reads are possible.
"""

supported_write_parameters = frozenset({"format"})
supported_extensions = frozenset({".pickle"})
can_read_from_local_file = True

def get_write_extension(self) -> str:
# Default to hdf5 but allow configuration via write parameter
format = self.write_parameters.get("format", "pickle")
if format == "pickle":
return ".pickle"
# Other supported formats can be added here
raise RuntimeError(
f"Requested file format '{format}' is not supported for PZModel"
)

def read_from_local_file(
self, path: str, component: str | None = None, expected_size: int = -1
) -> Any:
return RailModel.read(path) # type: ignore

def write_local_file(self, in_memory_dataset: Any, uri: ResourcePath) -> None:
in_memory_dataset.write(uri.ospath)
65 changes: 65 additions & 0 deletions python/lsst/meas/pz/qp_formatter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# This file is part of meaz_pz
#
# Developed for the LSST Data Management System.
# This product includes software developed by the LSST Project
# (http://www.lsst.org).
# See the COPYRIGHT file at the top-level directory of this distribution
# for details of code ownership.
#
# This software is dual licensed under the GNU General Public License and also
# under a 3-clause BSD license. Recipients may choose which of these licenses
# to use; please see the files gpl-3.0.txt and/or bsd_license.txt,
# respectively. If you choose the GPL option then the following text applies
# (but note that there is still no warranty even if you opt for BSD instead):
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

__all__ = ("QPFormatter",)

from typing import Any

import qp
from lsst.daf.butler import FormatterV2
from lsst.resources import ResourcePath


class QPFormatter(FormatterV2):
"""Read and write `qp.Ensemble` objects.

Currently assumes only local file reads are possible.
"""

supported_write_parameters = frozenset({"format"})
supported_extensions = frozenset({".hdf5", ".fits"})
can_read_from_local_file = True

def get_write_extension(self) -> str:
# Default to hdf5 but allow configuration via write parameter
format = self.write_parameters.get("format", "hdf5")
if format == "hdf5":
return ".hdf5"
if format == "fits":
return ".fits"
# Other supported formats can be added here
raise RuntimeError(
f"Requested file format '{format}' is not supported for Table"
)

def read_from_local_file(
self, path: str, component: str | None = None, expected_size: int = -1
) -> Any:
return qp.read(path) # type: ignore

def write_local_file(self, in_memory_dataset: Any, uri: ResourcePath) -> None:
in_memory_dataset.write_to(uri.ospath)
13 changes: 13 additions & 0 deletions tests/cleanup_script
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
butler remove-collections --no-confirm ../ci_hsc_gen3/DATA u/testing/pz_rail_testing
butler remove-runs --no-confirm ../ci_hsc_gen3/DATA "u/testing/pz_rail_testing*"
butler remove-runs --no-confirm ../ci_hsc_gen3/DATA "u/testing/pz_models"
butler remove-dataset-type ../ci_hsc_gen3/DATA/ pz_estimate_knn
butler remove-dataset-type ../ci_hsc_gen3/DATA/ pz_knn_config
butler remove-dataset-type ../ci_hsc_gen3/DATA/ pz_knn_log
butler remove-dataset-type ../ci_hsc_gen3/DATA/ pz_knn_metadata
butler remove-dataset-type ../ci_hsc_gen3/DATA/ pz_estimate_trainz
butler remove-dataset-type ../ci_hsc_gen3/DATA/ pz_trainz_config
butler remove-dataset-type ../ci_hsc_gen3/DATA/ pz_trainz_log
butler remove-dataset-type ../ci_hsc_gen3/DATA/ pz_trainz_metadata
butler remove-dataset-type ../ci_hsc_gen3/DATA/ pzModel_knn
butler remove-dataset-type ../ci_hsc_gen3/DATA/ pzModel_trainz
Binary file added tests/model_inform_train_z_wrap.pickle
Binary file not shown.
3 changes: 3 additions & 0 deletions tests/model_table_knn_hsc.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
file,instrument
eacharles marked this conversation as resolved.
Show resolved Hide resolved
tests/model_inform_knn_hsc_wrap.pickle,HSC

3 changes: 3 additions & 0 deletions tests/model_table_knn_lsst.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
file,instrument
tests/model_inform_knn_lsst_wrap.pickle,LSST

3 changes: 3 additions & 0 deletions tests/model_table_train_z.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
file,instrument
tests/model_inform_train_z_wrap.pickle,HSC

41 changes: 41 additions & 0 deletions tests/pz_pipeline_hsc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
description: |
Photo-z madness
tasks:
pz_trainz:
class: lsst.meas.pz.estimate_pz_task.EstimatePZTask
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fairly common practice to make convenient subclasses of tasks that override setDefaults with everything you've put into the Python block. You may end up needing to do that if you want to set obs package overrides anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I see a number of cases of overriding setDefaults in config classes. Are you saying that I should make additional Config / Pipetask class pairs for each algorithm? So that each algorithm has four classes: a config / task pair to do the algorithm and config Pipetask pair to select the particular Task. is that correct?

config:
connections.pzModel: 'pzModel_trainz'
connections.pzEnsemble: 'pz_estimate_trainz'
python: |
from lsst.meas.pz.estimate_pz_task_trainz import EstimatePZTrainZTask
config.pz_algo.retarget(EstimatePZTrainZTask)
config.pz_algo.stage_name='trainz'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you can put the string and other plain old data overrides outside of the python block since the python block always runs first. It's just the import and retarget call that need to go here.

config.pz_algo.output_mode='return'
config.pz_algo.band_a_env=dict(i=2.06)
pz_knn:
class: lsst.meas.pz.estimate_pz_task.EstimatePZTask
config:
connections.pzModel: 'pzModel_knn'
connections.pzEnsemble: 'pz_estimate_knn'
python: |
from lsst.meas.pz.estimate_pz_task_knn import EstimatePZKNNTask
config.pz_algo.retarget(EstimatePZKNNTask)
config.pz_algo.stage_name='knn'
config.pz_algo.output_mode='return'
config.pz_algo.bands=['mag_g_lsst','mag_r_lsst','mag_i_lsst','mag_z_lsst','mag_y_lsst']
config.pz_algo.ref_band='mag_i_lsst'
config.pz_algo.band_a_env=dict(g=3.64,r=2.70,i=2.06,z=1.58,y=1.31)
subsets:
all_pz:
subset:
- pz_trainz
- pz_knn
description: |
All of the photoz algorithms
# run me with
# pipetask run
# -b $CI_HSC_GEN3_DIR/DATA
# -i HSC/runs/ci_hsc
# -o u/echarles/pz_rail_testing
# -p "${MEAS_PZ_DIR}/tests/pz_pipeline.yaml"
# -d "skymap='discrete/ci_hsc' AND tract=0 AND patch=69"
Loading
Loading