Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-34048: Add completeness/purity plots #277

Merged
merged 10 commits into from
Sep 25, 2024
Merged

DM-34048: Add completeness/purity plots #277

merged 10 commits into from
Sep 25, 2024

Conversation

taranu
Copy link
Contributor

@taranu taranu commented Aug 2, 2024

No description provided.

@@ -0,0 +1,201 @@
description: |
Matched difference (measured vs reference) plots/metrics
parameters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think more of this should be in the atool and less in the pipeline file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are here to make it easier to change settings for all of the relevant plots at once, which is something I thought users would be reasonably likely to want to do.

On the other hand, I don't really like pipeline parameters either. Doing it this way means they have to be the same for all of the plots; if anyone wants to change a few here or there, they'd have to copy the file and change them per-tool anyway. There's no easy way I can think of to keep these in sync with the actual config defaults either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the note below. I dropped the parameters entirely.

I could re-implement the ability to mass change limits etc. in reconfigure_diff_matched_defaults if needed.

This file also ends up doing the same thing as the original DC2 /coaddDiffMatchedQualityExtended.yaml pipeline, which I have not updated here. They could be consolidated at some point, or have one base that they each import.

atools.matchedRefPsfMagChi.produce.plot.yLims: lims_mag_chi

# atools.matchedRefPositionXDiff: MatchedRefCoaddDiffPositionTool
# atools.matchedRefPositionXDiff.coord_meas: x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to all still be in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would need the x/y columns to be in the matched table to work. I'm debating whether to add them and make both x/y and ra/dec plots. Perhaps @cmsaunders has thoughts? I figure the difference plots will be uninteresting if the WCS is good (and if not, then it will show up elsewhere), but the chi plots might be a worthwhile sanity check for ra/dec errors.

Copy link
Collaborator

@sr525 sr525 Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand leaving it for later if it is going to be added back in but then maybe add a note and a TODO to say why it is there. Also if it is staying fix the yaml linter errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look through the whole ticket to fully get the context, but if this is going to show ra/dec for the pipeline position versus the injected position, that sounds interesting to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be ra/dec, the question is whether to also make plots of x/y (and particularly chi=(x_meas - x_ref)/x_err).

atools.matchedRefPositionDecChi.produce.plot.yLims: lims_pos_chi
atools.matchedRefPositionDecChi.compute_chi: true

python: |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't like all of this being in here but I think that is personal preference. I think it would look tidier if more of this was in the atool and less in the pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did what I could to simplify this by consolidating necessary overrides into a single function call in the python block. That function will almost almost always need to be called by anything overriding the task - in obs_subaru, for example, it needs to set the bands for the colour diff plots. I might consider changing the default context from dc2 to injection.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different bands, between obs_subaru and obs_decam for example, should be set in the obs specific config files.


mag_low_min = pexConfig.Field[int](
doc="Lower bound for the smallest magnitude bin in millimags",
default=15000,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This being in millimags seems silly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's meant to avoid having to deal with floating point math nonsense when setting up bins and such, assuming that nobody needs fractional mmags. I was considering naming it mmag_low_min etc. if that would help.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does going from 15 to 30 really make the maths than much harder than going from 15000 to 30000? It is much more understandable to go from 15 to 30 at a glance than 15000.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to specify a min, max and width (rather than a number of bins with the width implicit) and not have rounding remove/add one bin because it ended on 23.9999999 instead of 24.0.

from ..vector.selectors import RangeSelector


class CalcBinnedCompletenessAction(KeyedDataAction):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere this needs a decent doc string that tells you what it does and what it returns.


def __call__(self, data: KeyedData, **kwargs) -> KeyedData:
band = kwargs.get("band")
bins = tuple(x / 1000.0 for x in reversed(self.bins.get_bins()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the number of bins be a config option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bins are returned by a config class.

doc="The action to compute completeness/purity",
)
bins = pexConfig.ConfigField[MagnitudeBinnedMetricsConfig](
doc="The bin configuration",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "The bin configuration" mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a config field. Having said that, I am renaming it to MagnitudeBinConfig because it doesn't define anything specific to metrics.

range=(np.nanmin(values[matched]), np.nanmax(values[matched])),
bins=bins,
)
# Find bin where the fraction recovered first falls below 0.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is left over from before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole action is.

@@ -115,86 +134,260 @@ def setDefaults(self):
self.vectorKey = "refcat_is_pointsource"


class MatchedRefCoaddToolBase(MagnitudeXTool):
class InjectedObjectSelector(SelectorBase):
"""A selector for injected objects."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this selector being defined in the atool and not in vector actions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particularly compelling reason.

yield self.key_is_ref_star, Vector
yield self.key_is_target_galaxy, Vector
yield self.key_is_target_star, Vector

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these actions should be defined in the appropriate actions file. Or make a new one for injected stuff but other people may want to use these things and they should be in the actions folder where people will find them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actions/vector/selectors.py is starting to get rather long. Any name suggestions? injectedSelectors.py is more readable but selectorsInjected.py sorts better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are readable to me, but maybe I'm biased by familiarity.

action.selector_range = RangeSelector(
vectorKey=x_key,
minimum=minimum,
maximum=minimum + self._mag_interval,
minimum=minimum / 1000.0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems particularly silly for these to be defined in millimags and then /1000 everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do tend use mmag on the y axis.

@sr525
Copy link
Collaborator

sr525 commented Aug 7, 2024

A general comment I have is that atools should be fairly human readable and easy to understand, someone unfamiliar with the code should be able to look at it and see what is going on. This is particularly true for plots with injected sources where science users will, hopefully, see Rubin diagnostic plots and want to make them for their own injected data. These atools are structured differently from most others and fairly opaque. Is there anyway to make the atools more human readable to a newer user?

@@ -106,9 +116,139 @@ class FluxesDefaultConfig(Config):
ref_matched = ConfigField[FluxConfig](doc="Reference catalog magnitude")


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these things in genericBuild?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The selectors have been moved to the selectors file.

"""

parameterizedBand = Field[bool](
doc="Does this AnalysisTool support band as a name parameter", default=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be here, just set parametrizeBand = True/False at the top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all Trys' original plots using the atools matcher that are no longer being used, sorry for making you read them (again).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any of my plots getting used? I have them all saved on another PR so you can delete them if they're redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not here, no. I only kept what is still being used from your original commits.

AnalysisBaseConnections,
dimensions=("skymap", "tract"),
defaultTemplates={
"outputName": "matched_injected_deepCoadd_catalog_tract_injected_objectTable_tract",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does injected need to be in this name twice? I am not sure where it is actually named and guess that it is not on this ticket.

Copy link
Contributor

@jtmccann jtmccann Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this comes from DM-41210. The format for the connection name is "matched_catalog1_catalog2". Here both catalogs being matched happen to have "injected_" in the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the default connection name format for matched tables. Attempting to shorten the name would just make it harder to figure out which tables were matched.

from ..interfaces import AnalysisBaseConfig, AnalysisBaseConnections, AnalysisPipelineTask


class InjectedObjectAnalysisConnections(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that consolidating this with objectTableTractAnalysis would be better unless there is a compelling reason not to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also lean towards consolidation, but if they ever needed to diverge in the future, we'd have to add the class in again, editing every pipeline where it's used. Otherwise, the only benefit is having different defaults.

In retrospect we probably should have called ObjectTableTractAnalysis TractTableAnalysis since it's only the defaults that point to objectTable_tract.

from lsst.analysis.tools.actions.plot import CompletenessHist
from lsst.analysis.tools.actions.plot.plotUtils import get_and_remove_figure_text

# matplotlib.use("Agg")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment

Copy link
Contributor Author

@taranu taranu Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this needs to be there or else pytesting the file will open the plot in a window, although weirdly that doesn't happen with scons so scons must also be setting that somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a boolean (default False) to disable this for debugging tests.

@taranu taranu force-pushed the tickets/DM-34048 branch 2 times, most recently from d101e68 to 9d6d163 Compare August 16, 2024 19:45
@taranu taranu force-pushed the tickets/DM-34048 branch 2 times, most recently from 8c97a9c to 5e3bb12 Compare August 26, 2024 21:52
@taranu taranu merged commit 164a520 into main Sep 25, 2024
8 checks passed
@taranu taranu deleted the tickets/DM-34048 branch September 25, 2024 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants