Deep workflow #48

DinoBektesevic · 2024-11-26T18:54:04Z

Implements the DEEP Workflow used to analyze the single ccd single night search, but can be used to run on any set of image collections in a directory.

Additional log parsing and visualization functionality is provided.

Solution Description

Workflow

Running the Workflow is a multi-step process which includes additional preparation and cleanup work that executes at the
submit location:

Run prep
- Load runtime config
- find all files in staging_directory that match ic_filename_pattern
- filter out unwanted files based on the name
Run KBMOD Search
Summarize the workflow execution
- Parse logs, create a list of failed and successful logs
- Create a workflow Gantt chart

The config section that gives the directory, pattern and exclusion can look like:

[workflow]
staging_directory = "collections"
ic_filename_pattern = "*collection"
skip_ccds = ["002", "031", "061"]

The workflow writes to logs, resampled_wus, results and plots directories. There are two additional smaller workflows added - the one that just runs step2 and one that just runs postscript. These are mostly good for testing and short-cutting the entire workflow for debugging purposes.

Running a KBMOD search is a 3 step process:

step 1 (executed on CPUs)
- load ImageCollection
- filter unwanted rows of data from it
  - for Rubin processed DEEP this should exclude all wcs_err > 1e-04
  - for Rubin processed DEEP probably should exclude zeropoint > 27 or similar
  - this should probably reflect back onto the original ImageCollection, but it doesn't
  - so they're not implemented atm, because it causes confusion down the line
- load SearchConfiguration
- update search config values based on the IC metadata
  - currently the n_obs is the biggest discriminator for the data that is kept
  - we set the n_obs to len(ic)//2
  - once the len(ic) gets below 50-60, this seems to be too strict of a requirement
  - then the n_obs is pinned to 15
- materialize a WorkUnit
  - requires the Rubin Data Butler
- resample a WorkUnit
  - targets the largest common footprint WCS
- writes the WorkUnit to file
step 2 (executed on GPUs)
- loads the WorkUnit
- runs KBMOD search
- adds relevant metadata to the Results Table
  - this is not generalized solution
  - wcs, mjd_mid and uuid are guaranteed
  - visit and detector don't exist for every ImageCollection
- writes Results to file
step 3 (executed on CPUs)
- loads Results file
- makes an analysis plot
- this requires external resources, catalogs of fake and known objects
- Known objects are SkyBot query results - so the plotting code is structured to work with that
- this is not a general code though because user-made catalogs have no agreed upon standard

The config section for the step1 and postscript external resources can look like:

[apps.step1]
search_config_filepath = "earch_config.yaml"
butler_config_filepath = "/repo/root/butler.yaml"
n_workers = 8

[apps.postscript]
fake_object_catalog = "resources/fakes_detections_joined.minified.fits.bz2"
known_object_catalog = "resources/skybot_results_joined.minified.fits.bz2

Logging

Logs can be quite complicated.

A single log file can have several starting and ending messages - because the jobs can be resubmitted. Often they are preempted with means they will have several initial messages and maybe one ending message.

A single task will output several log files - one per step.

Each Log can consist of multiple log files. For example the complete log of a Task 20210101_A can consist of multiple
steps called 20210101_A_step1, 20210101_A_step2 etc. thus making the full Log of the event a union of every step.

Individual steps are parsed, extracting individual log entries and context; assigning to each log step a success, error and completion status:

a log is successful if all individual steps were successful.
a step is not completed if the log does not contain the expected final log entry.
a step is marked with an error if the log produces a traceback or an error report. A not completed log step does not indicate a error.
a step that is marked not-completed, but not-errored, may be safely re-run
a log is a failure if all steps were not completed or any of the steps has an error. A log may be safely re-run if no steps have an error, but not all steps have completed or not all steps have run (happens if the workflow times out and there were multiple retries).

Some functionality is pretty general - parse_logfile, the Log class and its fmt string, but contextualization depends heavily on the log content. This means much of the plotting colors etc. are affected too. To overwrite the default contextualization of the parsed logs override the _parse_single method.

Code Quality

The code is pretty shit, but at least it has documentation now and isn't duplicated across tasks.

There will be some wrestling with the default values in the repo right now, but that is at least approachable compared to the draft PR.

wilsonbb

Left several comments about cleaning up code that was commented out or adding some clarifications. But mainly approving

wilsonbb · 2024-12-03T23:36:49Z

src/kbmod_wf/single_chip_analysis.py

+import time
+
+
+#"ckpt_2gb_2cpus", "ckpt_2gb_2cpus", "astro_2gb_2cpus"]),


Nit: we can remove this

wilsonbb · 2024-12-03T23:47:35Z

src/kbmod_wf/single_chip_workflow2.py

+import time
+
+
+# "esci_48_8cpus" "astro_48_8cpus"


With comments like these, I would either remove them or just write out a quick one line comment about the executors

wilsonbb · 2024-12-03T23:48:35Z

src/kbmod_wf/single_chip_workflow2.py

+        ic = ImageCollection.read(ic_file)
+
+        ###    Mask out images that we don't want or can not search through
+        # mask out poor weather images


Would probably fully remove these. Can add a comment explaining why we no longer do this if needed.

wilsonbb · 2024-12-03T23:49:35Z

src/kbmod_wf/single_chip_workflow2.py

+        #mask_zp = np.logical_and(ic["zeroPoint"] > 27 , ic["zeroPoint"] < 32)
+        #ic = ic[np.logical_and(mask_zp, mask_wcs_err)]
+
+        # mask out images with WCS error more than 0.1 arcseconds because we


Since we no longer apply all of the masking, do we want to remove this whole section?

wilsonbb · 2024-12-03T23:51:38Z

src/kbmod_wf/task_impls/deep_plots.py

+    newx = result["x"]+dt*result["vx"]
+    newy = result["y"]+dt*result["vy"]
+    coord = wcs.pixel_to_world(newx, newy)
+    #pos.append(list(zip(coord.ra.deg, coord.dec.deg)))


Did this ever get fixed? If not, and we still need it maybe replace with a longer comment about what was wrong

wilsonbb · 2024-12-03T23:55:50Z

src/kbmod_wf/task_impls/deep_plots.py

+    #pos.append(list(zip(coord.ra.deg, coord.dec.deg)))
+    #pos_valid.append(obs_valid)                        # NOTE TO SELF: FIX LATER
+
+    return coord, obs_valid #SkyCoord(pos), pos_valid


update return value.

drewoldag

This all looks ok to me. I'm not seeing anything overly frightening in here.

drewoldag

This all looks ok to me. I'm not seeing anything overly frightening in here.

DinoBektesevic added 4 commits September 18, 2024 12:11

experimental wf

86e36af

Cleanup, squash, refactor later.

8494b0f

Latest changes.

0f99430

DEEP single CCD single night Workflow implementation.

4ef0d18

wilsonbb approved these changes Dec 4, 2024

View reviewed changes

drewoldag approved these changes Dec 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep workflow #48

Deep workflow #48

DinoBektesevic commented Nov 26, 2024 •

edited

Loading

wilsonbb left a comment

wilsonbb Dec 3, 2024

wilsonbb Dec 3, 2024

wilsonbb Dec 3, 2024

wilsonbb Dec 3, 2024

wilsonbb Dec 3, 2024

wilsonbb Dec 3, 2024

drewoldag left a comment

drewoldag left a comment

		import time


		#"ckpt_2gb_2cpus", "ckpt_2gb_2cpus", "astro_2gb_2cpus"]),

Deep workflow #48

Are you sure you want to change the base?

Deep workflow #48

Conversation

DinoBektesevic commented Nov 26, 2024 • edited Loading

Solution Description

Workflow

Logging

Code Quality

wilsonbb left a comment

Choose a reason for hiding this comment

wilsonbb Dec 3, 2024

Choose a reason for hiding this comment

wilsonbb Dec 3, 2024

Choose a reason for hiding this comment

wilsonbb Dec 3, 2024

Choose a reason for hiding this comment

wilsonbb Dec 3, 2024

Choose a reason for hiding this comment

wilsonbb Dec 3, 2024

Choose a reason for hiding this comment

wilsonbb Dec 3, 2024

Choose a reason for hiding this comment

drewoldag left a comment

Choose a reason for hiding this comment

drewoldag left a comment

Choose a reason for hiding this comment

DinoBektesevic commented Nov 26, 2024 •

edited

Loading