DM-34048: Add completeness/purity plots #277

taranu · 2024-08-02T15:59:51Z

No description provided.

sr525 · 2024-08-06T16:06:51Z

pipelines/coaddDiffMatchedInjectedCore.yaml

@@ -0,0 +1,201 @@
+description: |
+  Matched difference (measured vs reference) plots/metrics
+parameters:


I think more of this should be in the atool and less in the pipeline file.

These are here to make it easier to change settings for all of the relevant plots at once, which is something I thought users would be reasonably likely to want to do.

On the other hand, I don't really like pipeline parameters either. Doing it this way means they have to be the same for all of the plots; if anyone wants to change a few here or there, they'd have to copy the file and change them per-tool anyway. There's no easy way I can think of to keep these in sync with the actual config defaults either.

See the note below. I dropped the parameters entirely.

I could re-implement the ability to mass change limits etc. in reconfigure_diff_matched_defaults if needed.

This file also ends up doing the same thing as the original DC2 /coaddDiffMatchedQualityExtended.yaml pipeline, which I have not updated here. They could be consolidated at some point, or have one base that they each import.

sr525 · 2024-08-06T16:07:17Z

pipelines/coaddDiffMatchedInjectedCore.yaml

+      atools.matchedRefPsfMagChi.produce.plot.yLims: lims_mag_chi
+
+#      atools.matchedRefPositionXDiff: MatchedRefCoaddDiffPositionTool
+#      atools.matchedRefPositionXDiff.coord_meas: x


Is this meant to all still be in here?

It would need the x/y columns to be in the matched table to work. I'm debating whether to add them and make both x/y and ra/dec plots. Perhaps @cmsaunders has thoughts? I figure the difference plots will be uninteresting if the WCS is good (and if not, then it will show up elsewhere), but the chi plots might be a worthwhile sanity check for ra/dec errors.

I can understand leaving it for later if it is going to be added back in but then maybe add a note and a TODO to say why it is there. Also if it is staying fix the yaml linter errors.

I didn't look through the whole ticket to fully get the context, but if this is going to show ra/dec for the pipeline position versus the injected position, that sounds interesting to me.

There will be ra/dec, the question is whether to also make plots of x/y (and particularly chi=(x_meas - x_ref)/x_err).

sr525 · 2024-08-06T16:14:27Z

pipelines/coaddDiffMatchedInjectedCore.yaml

+      atools.matchedRefPositionDecChi.produce.plot.yLims: lims_pos_chi
+      atools.matchedRefPositionDecChi.compute_chi: true
+
+      python: |


I also don't like all of this being in here but I think that is personal preference. I think it would look tidier if more of this was in the atool and less in the pipeline.

I did what I could to simplify this by consolidating necessary overrides into a single function call in the python block. That function will almost almost always need to be called by anything overriding the task - in obs_subaru, for example, it needs to set the bands for the colour diff plots. I might consider changing the default context from dc2 to injection.

The different bands, between obs_subaru and obs_decam for example, should be set in the obs specific config files.

sr525 · 2024-08-06T16:17:45Z

python/lsst/analysis/tools/actions/config/binning.py

+
+    mag_low_min = pexConfig.Field[int](
+        doc="Lower bound for the smallest magnitude bin in millimags",
+        default=15000,


This being in millimags seems silly.

It's meant to avoid having to deal with floating point math nonsense when setting up bins and such, assuming that nobody needs fractional mmags. I was considering naming it mmag_low_min etc. if that would help.

Does going from 15 to 30 really make the maths than much harder than going from 15000 to 30000? It is much more understandable to go from 15 to 30 at a glance than 15000.

I wanted to specify a min, max and width (rather than a number of bins with the width implicit) and not have rounding remove/add one bin because it ended on 23.9999999 instead of 24.0.

python/lsst/analysis/tools/actions/config/binning.py

python/lsst/analysis/tools/actions/keyedData/calcBinnedCompleteness.py

sr525 · 2024-08-07T17:49:09Z

python/lsst/analysis/tools/actions/keyedData/calcBinnedCompleteness.py

+from ..vector.selectors import RangeSelector
+
+
+class CalcBinnedCompletenessAction(KeyedDataAction):


Somewhere this needs a decent doc string that tells you what it does and what it returns.

sr525 · 2024-08-07T17:51:08Z

python/lsst/analysis/tools/actions/keyedData/calcCompletenessHistogram.py

+
+    def __call__(self, data: KeyedData, **kwargs) -> KeyedData:
+        band = kwargs.get("band")
+        bins = tuple(x / 1000.0 for x in reversed(self.bins.get_bins()))


Should the number of bins be a config option?

The bins are returned by a config class.

sr525 · 2024-08-07T17:51:29Z

python/lsst/analysis/tools/actions/keyedData/calcCompletenessHistogram.py

+        doc="The action to compute completeness/purity",
+    )
+    bins = pexConfig.ConfigField[MagnitudeBinnedMetricsConfig](
+        doc="The bin configuration",


What does "The bin configuration" mean?

It is a config field. Having said that, I am renaming it to MagnitudeBinConfig because it doesn't define anything specific to metrics.

python/lsst/analysis/tools/actions/plot/completenessPlot.py

sr525 · 2024-08-07T18:52:19Z

python/lsst/analysis/tools/actions/scalar/scalarActions.py

+            range=(np.nanmin(values[matched]), np.nanmax(values[matched])),
+            bins=bins,
+        )
+        # Find bin where the fraction recovered first falls below 0.5


I think this comment is left over from before.

The whole action is.

python/lsst/analysis/tools/actions/vector/calcBinnedCompleteness.py

sr525 · 2024-08-07T19:13:07Z

python/lsst/analysis/tools/atools/diffMatched.py

@@ -115,86 +134,260 @@ def setDefaults(self):
        self.vectorKey = "refcat_is_pointsource"


-class MatchedRefCoaddToolBase(MagnitudeXTool):
+class InjectedObjectSelector(SelectorBase):
+    """A selector for injected objects."""


Why is this selector being defined in the atool and not in vector actions?

No particularly compelling reason.

sr525 · 2024-08-07T19:16:10Z

python/lsst/analysis/tools/atools/diffMatched.py

+        yield self.key_is_ref_star, Vector
+        yield self.key_is_target_galaxy, Vector
+        yield self.key_is_target_star, Vector
+


All of these actions should be defined in the appropriate actions file. Or make a new one for injected stuff but other people may want to use these things and they should be in the actions folder where people will find them.

actions/vector/selectors.py is starting to get rather long. Any name suggestions? injectedSelectors.py is more readable but selectorsInjected.py sorts better.

Both are readable to me, but maybe I'm biased by familiarity.

sr525 · 2024-08-07T19:21:27Z

python/lsst/analysis/tools/atools/diffMatched.py

                action.selector_range = RangeSelector(
                    vectorKey=x_key,
-                    minimum=minimum,
-                    maximum=minimum + self._mag_interval,
+                    minimum=minimum / 1000.0,


It seems particularly silly for these to be defined in millimags and then /1000 everywhere.

We do tend use mmag on the y axis.

sr525 · 2024-08-07T19:27:37Z

A general comment I have is that atools should be fairly human readable and easy to understand, someone unfamiliar with the code should be able to look at it and see what is going on. This is particularly true for plots with injected sources where science users will, hopefully, see Rubin diagnostic plots and want to make them for their own injected data. These atools are structured differently from most others and fairly opaque. Is there anyway to make the atools more human readable to a newer user?

sr525 · 2024-08-07T19:30:55Z

python/lsst/analysis/tools/atools/genericBuild.py

@@ -106,9 +116,139 @@ class FluxesDefaultConfig(Config):
    ref_matched = ConfigField[FluxConfig](doc="Reference catalog magnitude")




Why are these things in genericBuild?

The selectors have been moved to the selectors file.

sr525 · 2024-08-07T22:04:27Z

python/lsst/analysis/tools/atools/sourceInjectionPlots.py

+    """
+
+    parameterizedBand = Field[bool](
+        doc="Does this AnalysisTool support band as a name parameter", default=True


This doesn't need to be here, just set parametrizeBand = True/False at the top.

These are all Trys' original plots using the atools matcher that are no longer being used, sorry for making you read them (again).

Are any of my plots getting used? I have them all saved on another PR so you can delete them if they're redundant.

Not here, no. I only kept what is still being used from your original commits.

python/lsst/analysis/tools/atools/sourceInjectionPlots.py

sr525 · 2024-08-07T22:10:17Z

python/lsst/analysis/tools/tasks/injectedObjectAnalysis.py

+    AnalysisBaseConnections,
+    dimensions=("skymap", "tract"),
+    defaultTemplates={
+        "outputName": "matched_injected_deepCoadd_catalog_tract_injected_objectTable_tract",


Does injected need to be in this name twice? I am not sure where it is actually named and guess that it is not on this ticket.

I believe this comes from DM-41210. The format for the connection name is "matched_catalog1_catalog2". Here both catalogs being matched happen to have "injected_" in the name.

That's the default connection name format for matched tables. Attempting to shorten the name would just make it harder to figure out which tables were matched.

sr525 · 2024-08-07T22:10:57Z

python/lsst/analysis/tools/tasks/injectedObjectAnalysis.py

+from ..interfaces import AnalysisBaseConfig, AnalysisBaseConnections, AnalysisPipelineTask
+
+
+class InjectedObjectAnalysisConnections(


I do think that consolidating this with objectTableTractAnalysis would be better unless there is a compelling reason not to?

I also lean towards consolidation, but if they ever needed to diverge in the future, we'd have to add the class in again, editing every pipeline where it's used. Otherwise, the only benefit is having different defaults.

In retrospect we probably should have called ObjectTableTractAnalysis TractTableAnalysis since it's only the defaults that point to objectTable_tract.

python/lsst/analysis/tools/tasks/objectTableTractAnalysis.py

sr525 · 2024-08-07T22:11:43Z

tests/test_completenessPlot.py

+from lsst.analysis.tools.actions.plot import CompletenessHist
+from lsst.analysis.tools.actions.plot.plotUtils import get_and_remove_figure_text
+
+# matplotlib.use("Agg")


Actually this needs to be there or else pytesting the file will open the plot in a window, although weirdly that doesn't happen with scons so scons must also be setting that somewhere.

I added a boolean (default False) to disable this for debugging tests.

…values

sr525 reviewed Aug 6, 2024

View reviewed changes