-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-34048: Add completeness/purity plots #277
Conversation
@@ -0,0 +1,201 @@ | |||
description: | | |||
Matched difference (measured vs reference) plots/metrics | |||
parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think more of this should be in the atool and less in the pipeline file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are here to make it easier to change settings for all of the relevant plots at once, which is something I thought users would be reasonably likely to want to do.
On the other hand, I don't really like pipeline parameters either. Doing it this way means they have to be the same for all of the plots; if anyone wants to change a few here or there, they'd have to copy the file and change them per-tool anyway. There's no easy way I can think of to keep these in sync with the actual config defaults either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the note below. I dropped the parameters entirely.
I could re-implement the ability to mass change limits etc. in reconfigure_diff_matched_defaults
if needed.
This file also ends up doing the same thing as the original DC2 /coaddDiffMatchedQualityExtended.yaml
pipeline, which I have not updated here. They could be consolidated at some point, or have one base that they each import.
atools.matchedRefPsfMagChi.produce.plot.yLims: lims_mag_chi | ||
|
||
# atools.matchedRefPositionXDiff: MatchedRefCoaddDiffPositionTool | ||
# atools.matchedRefPositionXDiff.coord_meas: x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to all still be in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would need the x/y columns to be in the matched table to work. I'm debating whether to add them and make both x/y and ra/dec plots. Perhaps @cmsaunders has thoughts? I figure the difference plots will be uninteresting if the WCS is good (and if not, then it will show up elsewhere), but the chi plots might be a worthwhile sanity check for ra/dec errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand leaving it for later if it is going to be added back in but then maybe add a note and a TODO to say why it is there. Also if it is staying fix the yaml linter errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look through the whole ticket to fully get the context, but if this is going to show ra/dec for the pipeline position versus the injected position, that sounds interesting to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be ra/dec, the question is whether to also make plots of x/y (and particularly chi=(x_meas - x_ref)/x_err).
atools.matchedRefPositionDecChi.produce.plot.yLims: lims_pos_chi | ||
atools.matchedRefPositionDecChi.compute_chi: true | ||
|
||
python: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't like all of this being in here but I think that is personal preference. I think it would look tidier if more of this was in the atool and less in the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did what I could to simplify this by consolidating necessary overrides into a single function call in the python block. That function will almost almost always need to be called by anything overriding the task - in obs_subaru, for example, it needs to set the bands for the colour diff plots. I might consider changing the default context from dc2
to injection
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The different bands, between obs_subaru and obs_decam for example, should be set in the obs specific config files.
|
||
mag_low_min = pexConfig.Field[int]( | ||
doc="Lower bound for the smallest magnitude bin in millimags", | ||
default=15000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This being in millimags seems silly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's meant to avoid having to deal with floating point math nonsense when setting up bins and such, assuming that nobody needs fractional mmags. I was considering naming it mmag_low_min
etc. if that would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does going from 15 to 30 really make the maths than much harder than going from 15000 to 30000? It is much more understandable to go from 15 to 30 at a glance than 15000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to specify a min, max and width (rather than a number of bins with the width implicit) and not have rounding remove/add one bin because it ended on 23.9999999 instead of 24.0.
python/lsst/analysis/tools/actions/keyedData/calcBinnedCompleteness.py
Outdated
Show resolved
Hide resolved
from ..vector.selectors import RangeSelector | ||
|
||
|
||
class CalcBinnedCompletenessAction(KeyedDataAction): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere this needs a decent doc string that tells you what it does and what it returns.
|
||
def __call__(self, data: KeyedData, **kwargs) -> KeyedData: | ||
band = kwargs.get("band") | ||
bins = tuple(x / 1000.0 for x in reversed(self.bins.get_bins())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the number of bins be a config option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bins are returned by a config class.
doc="The action to compute completeness/purity", | ||
) | ||
bins = pexConfig.ConfigField[MagnitudeBinnedMetricsConfig]( | ||
doc="The bin configuration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "The bin configuration" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a config field. Having said that, I am renaming it to MagnitudeBinConfig
because it doesn't define anything specific to metrics.
range=(np.nanmin(values[matched]), np.nanmax(values[matched])), | ||
bins=bins, | ||
) | ||
# Find bin where the fraction recovered first falls below 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is left over from before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole action is.
python/lsst/analysis/tools/actions/vector/calcBinnedCompleteness.py
Outdated
Show resolved
Hide resolved
python/lsst/analysis/tools/actions/vector/calcBinnedCompleteness.py
Outdated
Show resolved
Hide resolved
@@ -115,86 +134,260 @@ def setDefaults(self): | |||
self.vectorKey = "refcat_is_pointsource" | |||
|
|||
|
|||
class MatchedRefCoaddToolBase(MagnitudeXTool): | |||
class InjectedObjectSelector(SelectorBase): | |||
"""A selector for injected objects.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this selector being defined in the atool and not in vector actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particularly compelling reason.
yield self.key_is_ref_star, Vector | ||
yield self.key_is_target_galaxy, Vector | ||
yield self.key_is_target_star, Vector | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these actions should be defined in the appropriate actions file. Or make a new one for injected stuff but other people may want to use these things and they should be in the actions folder where people will find them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actions/vector/selectors.py
is starting to get rather long. Any name suggestions? injectedSelectors.py
is more readable but selectorsInjected.py
sorts better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are readable to me, but maybe I'm biased by familiarity.
action.selector_range = RangeSelector( | ||
vectorKey=x_key, | ||
minimum=minimum, | ||
maximum=minimum + self._mag_interval, | ||
minimum=minimum / 1000.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems particularly silly for these to be defined in millimags and then /1000 everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do tend use mmag on the y axis.
A general comment I have is that atools should be fairly human readable and easy to understand, someone unfamiliar with the code should be able to look at it and see what is going on. This is particularly true for plots with injected sources where science users will, hopefully, see Rubin diagnostic plots and want to make them for their own injected data. These atools are structured differently from most others and fairly opaque. Is there anyway to make the atools more human readable to a newer user? |
@@ -106,9 +116,139 @@ class FluxesDefaultConfig(Config): | |||
ref_matched = ConfigField[FluxConfig](doc="Reference catalog magnitude") | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these things in genericBuild?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The selectors have been moved to the selectors file.
""" | ||
|
||
parameterizedBand = Field[bool]( | ||
doc="Does this AnalysisTool support band as a name parameter", default=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be here, just set parametrizeBand = True/False at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all Trys' original plots using the atools matcher that are no longer being used, sorry for making you read them (again).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are any of my plots getting used? I have them all saved on another PR so you can delete them if they're redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not here, no. I only kept what is still being used from your original commits.
AnalysisBaseConnections, | ||
dimensions=("skymap", "tract"), | ||
defaultTemplates={ | ||
"outputName": "matched_injected_deepCoadd_catalog_tract_injected_objectTable_tract", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does injected need to be in this name twice? I am not sure where it is actually named and guess that it is not on this ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this comes from DM-41210. The format for the connection name is "matched_catalog1_catalog2". Here both catalogs being matched happen to have "injected_" in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the default connection name format for matched tables. Attempting to shorten the name would just make it harder to figure out which tables were matched.
from ..interfaces import AnalysisBaseConfig, AnalysisBaseConnections, AnalysisPipelineTask | ||
|
||
|
||
class InjectedObjectAnalysisConnections( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think that consolidating this with objectTableTractAnalysis would be better unless there is a compelling reason not to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also lean towards consolidation, but if they ever needed to diverge in the future, we'd have to add the class in again, editing every pipeline where it's used. Otherwise, the only benefit is having different defaults.
In retrospect we probably should have called ObjectTableTractAnalysis
TractTableAnalysis
since it's only the defaults that point to objectTable_tract.
tests/test_completenessPlot.py
Outdated
from lsst.analysis.tools.actions.plot import CompletenessHist | ||
from lsst.analysis.tools.actions.plot.plotUtils import get_and_remove_figure_text | ||
|
||
# matplotlib.use("Agg") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this needs to be there or else pytesting the file will open the plot in a window, although weirdly that doesn't happen with scons so scons must also be setting that somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a boolean (default False
) to disable this for debugging tests.
d101e68
to
9d6d163
Compare
8c97a9c
to
5e3bb12
Compare
2b6edfa
to
10446e2
Compare
10446e2
to
c2aba92
Compare
f6369a4
to
345d8f0
Compare
67d6855
to
244db99
Compare
244db99
to
46eaa10
Compare
No description provided.