Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38546: Implement new CalibrateImageTask #802

Merged
merged 4 commits into from
Jul 31, 2023
Merged

DM-38546: Implement new CalibrateImageTask #802

merged 4 commits into from
Jul 31, 2023

Conversation

parejkoj
Copy link
Contributor

No description provided.

@parejkoj parejkoj force-pushed the tickets/DM-38546 branch 4 times, most recently from 46d8a97 to 4620202 Compare June 30, 2023 00:05
@@ -417,6 +417,9 @@ def run(self, exposure, sourceCat, expId=0):
flux0 = 10**(0.4*r.zp) # Flux of mag=0 star
flux0err = 0.4*math.log(10)*flux0*r.sigma # Error in flux0
photoCalib = makePhotoCalibFromCalibZeroPoint(flux0, flux0err)
self.log.info("Photometric calibration factor (nJy/ADU): %f +/- %f",
photoCalib.getCalibrationMean(),
photoCalib.getCalibrationErr())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you roll this into the previous log message or set it to DEBUG? I don't think we usually want this task to be very verbose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or use self.log.verbose if this is not really a debug message but also not something you want people to see all the time. Pipeline batch execution runs at VERBOSE level rather than INFO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PhotoCalTask is very much not chatty:

  • Are color terms being applied (and if so, from where)?
  • Report result as a magnitude zero point.
  • Report result as calibration factor (this addition).

We still don't have clear guidance on what to output at what level, but two lines summarizing what the result is seems appropriate here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a very strong opinion. I just think people tend to look at task-level log messages in order to:

  • help diagnose a problem
  • get a quick sense of status or timing (there are better ways for both, but probably not easier ways)

and I don't see this one as helping with either.


astrometry_ref_cat = connectionTypes.PrerequisiteInput(
doc="Reference catalog to use for astrometric calibration.",
name="gaia_dr2_20200414",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep an eye on Clare's Gaia DR3 ticket in case it lands before this does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'll have dr3 distributed across our various test datasets nor fully vetted soon enough. It's an easy change here once that happens.

storageClass="ExposureF",
dimensions=("instrument", "visit", "detector"),
)
# TODO: persist a parquet version of this!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this ticket, I assume?

Copy link
Contributor Author

@parejkoj parejkoj Jul 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure: we'll need the SourceCatalog versions to store the footprints, but Eli's your vision is to only store the footprints, none of the other results in that SourceCatalog. I'm not sure that's actually practical here.

In order to get this merged, I guess I'll put this on the "do it later" list, but it should probably be near the top of that list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as DM-40061.

stars = connectionTypes.Output(
doc="Catalog of unresolved sources detected on the calibrated exposure; "
"includes source footprints.",
name="initial_stars", # TODO: what to name this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO should definitely be done on this ticket, and that includes thinking about what to name it now if we want the long-term name to be the parquet variant, because adding new dataset types is a gazillion times easier than changing the meaning of old ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial_stars_footprints_detector, in keeping with the "this is where you get the source footprints" thing, and also the "append _detector so full-visit things don't need a suffix"? And then the parquet version would be initial_stars_detector?

If instead we called this initial_stars_detector, what would we call the parquet version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty happy with initial_stars_footprints_detector for FITS and initial_stars_detector for parquet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the _detector on the AFW tables? They're never aggregated, as far as I'm aware. If we want _detector on all detector-level catalogs, no matter their format, that's fine.

python/lsst/pipe/tasks/calibrateImage.py Show resolved Hide resolved
python/lsst/pipe/tasks/calibrateImage.py Outdated Show resolved Hide resolved
tests/test_calibrateImage.py Outdated Show resolved Hide resolved
tests/test_calibrateImage.py Outdated Show resolved Hide resolved
# We don't have many sources, so have to fit simpler models.
self.config.psf_detection.background.approxOrderX = 1
self.config.star_detection.background.approxOrderX = 1
# TODO: While debugging DM-32701, we're using PCA instead of psfex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO a relic now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no. DM-32701 fixed the "junk PSF produced" problem into a "raise error", but we still have the same problem of it thinking we don't have enough PSF sources (even when we should for the given fitting order). We don't have a followup ticket to explore that further, so I'm not sure what to say here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just about the test, not any real data, let's just remove the TODO.

PSFEx seeming inconsistent about how many stars it needs is something I'd consider a low-priority problem (to the point where I'm not sure a ticket or TODO is merited), because it's easy to imagine that happening because our thinking of its algorithm is an oversimplification and it genuinely has more degrees of freedom.

If the problem is that PSFEx just isn't robust enough on some important class of data, that's different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is that we don't know which of those cases it is. This is basically the problem described on DM-40001. In this case, we can just use PCA since the tests don't really care, but I think we probably should investigate psfex's failure modes more fully here.

idx, _, _ = fitted.match_to_catalog_sky(truth)
# TODO: because the input variance image does not include contributions
# from the sources, the measured instFluxes do not reflect the greater
# uncertainty for faint sources; we can't use fluxErr as a bound on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "greater uncertainty for bright sources"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded it.

It's worth logging this in addition to the zero point logged above,
as this is the value that is saved in the PhotoCalib object.
gen3 pipetasks call setRefObjLoader; users in notebooks and tests may
still want to use the __init__ interface though.
This is necessary for writing the denormalized match catalog.
@parejkoj
Copy link
Contributor Author

I've also added in an "install simple PSF after initial PSF determination", per conversation on slack last week.

@@ -417,6 +417,9 @@ def run(self, exposure, sourceCat, expId=0):
flux0 = 10**(0.4*r.zp) # Flux of mag=0 star
flux0err = 0.4*math.log(10)*flux0*r.sigma # Error in flux0
photoCalib = makePhotoCalibFromCalibZeroPoint(flux0, flux0err)
self.log.info("Photometric calibration factor (nJy/ADU): %f +/- %f",
photoCalib.getCalibrationMean(),
photoCalib.getCalibrationErr())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a very strong opinion. I just think people tend to look at task-level log messages in order to:

  • help diagnose a problem
  • get a quick sense of status or timing (there are better ways for both, but probably not easier ways)

and I don't see this one as helping with either.

# We don't have many sources, so have to fit simpler models.
self.config.psf_detection.background.approxOrderX = 1
self.config.star_detection.background.approxOrderX = 1
# TODO: While debugging DM-32701, we're using PCA instead of psfex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just about the test, not any real data, let's just remove the TODO.

PSFEx seeming inconsistent about how many stars it needs is something I'd consider a low-priority problem (to the point where I'm not sure a ticket or TODO is merited), because it's easy to imagine that happening because our thinking of its algorithm is an oversimplification and it genuinely has more degrees of freedom.

If the problem is that PSFEx just isn't robust enough on some important class of data, that's different.

stars = connectionTypes.Output(
doc="Catalog of unresolved sources detected on the calibrated exposure; "
"includes source footprints.",
name="initial_stars", # TODO: what to name this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty happy with initial_stars_footprints_detector for FITS and initial_stars_detector for parquet.

This task will replace CharacterizeImageTask and CalibrateTask.
@parejkoj parejkoj merged commit 2ebef2b into main Jul 31, 2023
2 checks passed
@parejkoj parejkoj deleted the tickets/DM-38546 branch July 31, 2023 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants