Update logic to retrieve threshold-specific pellet delivery timestamps #398

ttngu207 · 2024-08-22T22:34:04Z

High level logic for block detection
On a per-chunk basis, check for the presence of new block, insert into Block table.

Find the 0s in pellet_ct (these are times when the pellet count reset - i.e. new block)
Remove any double 0s (0s within 1 second of each other) (pick the first 0)
Calculate block end_times (use due_time) and durations
Insert into Block table

…eshold update

…_part"

aeon/dj_pipeline/analysis/block_analysis.py

Co-authored-by: Chang Huan Lo <[email protected]>

…Centre/gl-load-kwargs Allow forwarding of load function kwargs to reader

…_fix

jkbhagatio · 2024-09-12T14:32:58Z

We need to add checking the beambreaks after attempted pellet delivery, to account for the cases where pellet delivery is attempted but not successful..

I need to look into the false negative rate (beambreaks that don't occur after successful pellet delivery)

ttngu207 · 2024-09-18T15:16:50Z

@jkbhagatio Removal of ManualDelivery is added. The block key I'm testing with is:

{'experiment_name': 'social0.3-aeon3',
 'block_start': '2024-06-26 10:52:10.001984',
 'block_end': '2024-06-26 12:57:18'}

) (SainsburyWellcomeCentre#406)

…sburyWellcomeCentre#413) * Persist venv across job steps * Update codecov-action version * Remove `env_config`

…Centre/gl-ruff-check Clear all CI checks on all platforms

…Centre/gl-ruff-check Ensure CI runs on push to main

lochhh · 2024-09-24T17:33:25Z

aeon/dj_pipeline/analysis/block_analysis.py

@@ -188,37 +183,14 @@ def make(self, key):
        )
        patch_keys, patch_names = patch_query.fetch("KEY", "underground_feeder_name")


Should we exclude Dummy patches here or is this already handled elsewhere and we can assume dummy patches will never be fetched?

Not excluded, Dummy Patches will still be included in the analysis, but will return 0 pellets, wheel data will still be computed

lochhh · 2024-09-24T17:38:44Z

aeon/dj_pipeline/analysis/block_analysis.py

+    depletion_state_df = depletion_state_df[~invalid_rows]
+
+    # find pellet times with matching beam break times (within 500ms after pellet times)
+    pellet_beam_break_df = (


Can any of delivered_pellet_df, beam_break_df, depletion_state_df be an empty df? If yes, pd.merge_asof will throw an error.

jkbhagatio · 2024-09-25T20:18:02Z

@ttngu207

I get slightly different results with this approach - it returns more final match pellet delivery triggers, beambreaks, and threshold values.

It's similar to yours, I just changed things to make it more readable for me, but see the diffs, and see the results, and let me know what you think.

(If you want me to put it in a separate PR or something let me know, else you can just copy and paste and compare)

def get_threshold_associated_pellets(patch_key, start, end):
    """
    Retrieve the pellet delivery timestamps associated with each patch threshold update within the specified start-end time.
    1. Get all patch state update timestamps (DepletionState): let's call these events "A"
        - Remove all events within 1 second of each other
        - Remove all events without threshold value (NaN)
    2. Get all pellet delivery timestamps (DeliverPellet): let's call these events "B"
        - Find matching beam break timestamps within 1.2s after each pellet delivery
    3. For each event "A", find the nearest event "B" within 100ms before or after the event "A"
        - These are the pellet delivery events "B" associated with the previous threshold update event "A"
    4. Shift back the pellet delivery timestamps by 1 to match the pellet delivery with the previous threshold update
    5. Remove all threshold updates events "A" without a corresponding pellet delivery event "B"
    Args:
        patch_key (dict): primary key for the patch
        start (datetime): start timestamp
        end (datetime): end timestamp
    Returns:
        pd.DataFrame: DataFrame with the following columns:
        - threshold_update_timestamp (index)
        - pellet_timestamp
        - beam_break_timestamp
        - offset
        - rate
    """
    chunk_restriction = acquisition.create_chunk_restriction(patch_key["experiment_name"], start, end)

    # Get pellet delivery trigger data
    delivered_pellet_df = fetch_stream(
        streams.UndergroundFeederDeliverPellet & patch_key & chunk_restriction
    )[start:end]
    # Remove invalid rows where the time difference is less than 1.2 seconds
    invalid_rows = delivered_pellet_df.index.to_series().diff().dt.total_seconds() < 1.2
    delivered_pellet_df = delivered_pellet_df[~invalid_rows]

    # Get beambreak data
    beambreak_df = fetch_stream(streams.UndergroundFeederBeamBreak & patch_key & chunk_restriction)[
        start:end
    ]
    # Remove invalid rows where the time difference is less than 1 second
    invalid_rows = beambreak_df.index.to_series().diff().dt.total_seconds() < 1
    beambreak_df = beambreak_df[~invalid_rows]
    # Exclude manual deliveries
    manual_delivery_df = fetch_stream(  
        streams.UndergroundFeederManualDelivery & patch_key & chunk_restriction
    )[start:end]
    delivered_pellet_df = delivered_pellet_df.loc[
        delivered_pellet_df.index.difference(manual_delivery_df.index)
    ]

    # Return empty if no pellets
    if delivered_pellet_df.empty or beam_break_df.empty:
        return acquisition.io_api._empty(
            ["threshold", "offset", "rate", "pellet_timestamp", "beam_break_timestamp"]
        )

    # Find pellet delivery triggers with matching beambreaks within 1.2s after each pellet delivery
    pellet_beam_break_df = (
        pd.merge_asof(
            delivered_pellet_df.reset_index(),
            beambreak_df.reset_index().rename(columns={"time": "beam_break_timestamp"}),
            left_on="time",
            right_on="beam_break_timestamp",
            tolerance=pd.Timedelta("1.2s"),
            direction="forward",
        )
        .set_index("time")
        .dropna(subset=["beam_break_timestamp"])
    )
    pellet_beam_break_df.drop_duplicates(subset="beam_break_timestamp", keep="last", inplace=True)

    # Get patch threshold data
    depletion_state_df = fetch_stream(
        streams.UndergroundFeederDepletionState & patch_key & chunk_restriction
    )[start:end]
    # Remove NaNs
    depletion_state_df = depletion_state_df.dropna(subset=["threshold"])
    # Remove invalid rows where the time difference is less than 1 second
    invalid_rows = depletion_state_df.index.to_series().diff().dt.total_seconds() < 1
    depletion_state_df = depletion_state_df[~invalid_rows]

    # Find pellet delivery triggers that approximately coincide with each threshold update
    # i.e. nearest pellet delivery within 100ms before or after threshold update
    pellet_ts_threshold_df = (
        pd.merge_asof(
            depletion_state_df.reset_index(),
            pellet_beam_break_df.reset_index().rename(columns={"time": "pellet_timestamp"}),
            left_on="time",
            right_on="pellet_timestamp",
            tolerance=pd.Timedelta("100ms"),
            direction="nearest",
        )
        .set_index("time")
        .dropna(subset=["pellet_timestamp"])
    )

    # Clean up the df
    pellet_ts_threshold_df = pellet_ts_threshold_df.drop(columns=["event_x", "event_y"])
    # Shift back the pellet_timestamp values by 1 to match with the previous threshold update
    pellet_ts_threshold_df.pellet_timestamp = pellet_ts_threshold_df.pellet_timestamp.shift(-1)
    pellet_ts_threshold_df.beam_break_timestamp = pellet_ts_threshold_df.beam_break_timestamp.shift(-1)
    pellet_ts_threshold_df = pellet_ts_threshold_df.dropna(subset=["pellet_timestamp", "beam_break_timestamp"])
    return pellet_ts_threshold_df

ttngu207 · 2024-09-27T12:41:15Z

@ttngu207

I get slightly different results with this approach - it returns more final match pellet delivery triggers, beambreaks, and threshold values.

It's similar to yours, I just changed things to make it more readable for me, but see the diffs, and see the results, and let me know what you think.

(If you want me to put it in a separate PR or something let me know, else you can just copy and paste and compare)

def get_threshold_associated_pellets(patch_key, start, end):
    """
    Retrieve the pellet delivery timestamps associated with each patch threshold update within the specified start-end time.
    1. Get all patch state update timestamps (DepletionState): let's call these events "A"
        - Remove all events within 1 second of each other
        - Remove all events without threshold value (NaN)
    2. Get all pellet delivery timestamps (DeliverPellet): let's call these events "B"
        - Find matching beam break timestamps within 1.2s after each pellet delivery
    3. For each event "A", find the nearest event "B" within 100ms before or after the event "A"
        - These are the pellet delivery events "B" associated with the previous threshold update event "A"
    4. Shift back the pellet delivery timestamps by 1 to match the pellet delivery with the previous threshold update
    5. Remove all threshold updates events "A" without a corresponding pellet delivery event "B"
    Args:
        patch_key (dict): primary key for the patch
        start (datetime): start timestamp
        end (datetime): end timestamp
    Returns:
        pd.DataFrame: DataFrame with the following columns:
        - threshold_update_timestamp (index)
        - pellet_timestamp
        - beam_break_timestamp
        - offset
        - rate
    """
    chunk_restriction = acquisition.create_chunk_restriction(patch_key["experiment_name"], start, end)

    # Get pellet delivery trigger data
    delivered_pellet_df = fetch_stream(
        streams.UndergroundFeederDeliverPellet & patch_key & chunk_restriction
    )[start:end]
    # Remove invalid rows where the time difference is less than 1.2 seconds
    invalid_rows = delivered_pellet_df.index.to_series().diff().dt.total_seconds() < 1.2
    delivered_pellet_df = delivered_pellet_df[~invalid_rows]

    # Get beambreak data
    beambreak_df = fetch_stream(streams.UndergroundFeederBeamBreak & patch_key & chunk_restriction)[
        start:end
    ]
    # Remove invalid rows where the time difference is less than 1 second
    invalid_rows = beambreak_df.index.to_series().diff().dt.total_seconds() < 1
    beambreak_df = beambreak_df[~invalid_rows]
    # Exclude manual deliveries
    manual_delivery_df = fetch_stream(  
        streams.UndergroundFeederManualDelivery & patch_key & chunk_restriction
    )[start:end]
    delivered_pellet_df = delivered_pellet_df.loc[
        delivered_pellet_df.index.difference(manual_delivery_df.index)
    ]

    # Return empty if no pellets
    if delivered_pellet_df.empty or beam_break_df.empty:
        return acquisition.io_api._empty(
            ["threshold", "offset", "rate", "pellet_timestamp", "beam_break_timestamp"]
        )

    # Find pellet delivery triggers with matching beambreaks within 1.2s after each pellet delivery
    pellet_beam_break_df = (
        pd.merge_asof(
            delivered_pellet_df.reset_index(),
            beambreak_df.reset_index().rename(columns={"time": "beam_break_timestamp"}),
            left_on="time",
            right_on="beam_break_timestamp",
            tolerance=pd.Timedelta("1.2s"),
            direction="forward",
        )
        .set_index("time")
        .dropna(subset=["beam_break_timestamp"])
    )
    pellet_beam_break_df.drop_duplicates(subset="beam_break_timestamp", keep="last", inplace=True)

    # Get patch threshold data
    depletion_state_df = fetch_stream(
        streams.UndergroundFeederDepletionState & patch_key & chunk_restriction
    )[start:end]
    # Remove NaNs
    depletion_state_df = depletion_state_df.dropna(subset=["threshold"])
    # Remove invalid rows where the time difference is less than 1 second
    invalid_rows = depletion_state_df.index.to_series().diff().dt.total_seconds() < 1
    depletion_state_df = depletion_state_df[~invalid_rows]

    # Find pellet delivery triggers that approximately coincide with each threshold update
    # i.e. nearest pellet delivery within 100ms before or after threshold update
    pellet_ts_threshold_df = (
        pd.merge_asof(
            depletion_state_df.reset_index(),
            pellet_beam_break_df.reset_index().rename(columns={"time": "pellet_timestamp"}),
            left_on="time",
            right_on="pellet_timestamp",
            tolerance=pd.Timedelta("100ms"),
            direction="nearest",
        )
        .set_index("time")
        .dropna(subset=["pellet_timestamp"])
    )

    # Clean up the df
    pellet_ts_threshold_df = pellet_ts_threshold_df.drop(columns=["event_x", "event_y"])
    # Shift back the pellet_timestamp values by 1 to match with the previous threshold update
    pellet_ts_threshold_df.pellet_timestamp = pellet_ts_threshold_df.pellet_timestamp.shift(-1)
    pellet_ts_threshold_df.beam_break_timestamp = pellet_ts_threshold_df.beam_break_timestamp.shift(-1)
    pellet_ts_threshold_df = pellet_ts_threshold_df.dropna(subset=["pellet_timestamp", "beam_break_timestamp"])
    return pellet_ts_threshold_df

Looks good @jkbhagatio
Yes, the logic is the same, code reorganized to be more readable.

I've updated this PR to include the changes.

This PR is also up to date with main, so merging this will get datajoint_pipeline to include all recent changes on main
Or, if not ready, we can review/merge #416
(tagging @MilagrosMarin here too, if we're merging this PR 398 first, then no need for 416)

jkbhagatio · 2024-09-28T13:42:00Z

Looks good @jkbhagatio
Yes, the logic is the same, code reorganized to be more readable.

I've updated this PR to include the changes.

This PR is also up to date with main, so merging this will get datajoint_pipeline to include all recent changes on main
Or, if not ready, we can review/merge #416
(tagging @MilagrosMarin here too, if we're merging this PR 398 first, then no need for 416)

Cool, sounds good, I'm happy with this to merge then, feel free to go ahead @ttngu207 !

ttngu207 added 3 commits August 20, 2024 14:56

feat(block_analysis): new function to extract pellet_times WIP

110bbb1

feat: new logic to identify delivered pellet associated with each thr…

2371dc3

…eshold update

fix: minor bugfix

764fdb0

ttngu207 requested review from jkbhagatio and lochhh August 22, 2024 22:34

ttngu207 added 2 commits August 23, 2024 14:18

fix(block_analysis): bugfix restricting positions to only the "anchor…

e5ae334

…_part"

chore: minor code cleanup

e3337f7

ttngu207 marked this pull request as ready for review August 28, 2024 14:13

lochhh requested changes Aug 29, 2024

View reviewed changes

aeon/dj_pipeline/analysis/block_analysis.py Outdated Show resolved Hide resolved

aeon/dj_pipeline/analysis/block_analysis.py Outdated Show resolved Hide resolved

ttngu207 and others added 16 commits August 30, 2024 10:16

Apply suggestions from code review

cee37c0

Co-authored-by: Chang Huan Lo <[email protected]>

Allow forwarding of load function kwargs to reader

1a2ade9

Apply all safe ruff fixes

9637fb4

Black formatting

97389bd

Move top-level linter settings to lint section

e60b766

Ignore missing docs in __init__ and magic methods

d2a5104

Apply ruff recommendations to low-level API

743324a

Ignore missing docs for module, package and tests

257d9cd

Ignore missing docs for schema classes and streams

e4fe028

Apply more ruff recommendations to low-level API

0927a9a

Merge pull request SainsburyWellcomeCentre#401 from SainsburyWellcome…

a5bc419

…Centre/gl-load-kwargs Allow forwarding of load function kwargs to reader

Merge remote-tracking branch 'upstream/main' into dev_block_detection…

29c0250

…_fix

Add support for downsampling encoder data

00ead97

Ensure low-level API tests run on raw data

dfd2d02

feat(block_analysis): update block detection, use due_time

97fd909

fix(block_analysis): bugfix using dataframe diff with datetime type

9c45ab9

lochhh added 4 commits September 12, 2024 19:31

Update pre-commit-config

be5d3c1

Remove black dependency

38aeeba

Temporarily disable ruff and pyright in pre-commit

53e88c2

Auto-fix mixed lined endings and trailing whitespace

6798e07

jkbhagatio mentioned this pull request Sep 18, 2024

Incorporate new block plots from update_block_plots branch into the pipeline #399

Closed

ttngu207 and others added 2 commits September 18, 2024 10:13

feat(block_analysis): exclude ManualDelivery from pellet detection

a6e2787

Merge remote-tracking branch 'origin/main' into gl-ruff-check

9c9a88b

ttngu207 and others added 17 commits September 18, 2024 10:22

Merge remote-tracking branch 'upstream/main' into datajoint_pipeline

c69b78e

Apply remaining ruff recommendations

df20e9f

Exclude venv folder from pyright checks

6e64c83

Remove obsolete and unused qc module

8d0c03f

Apply pyright recommendations

97bc21c

Disable useLibraryCodeForTypes

6bacc43

Remove unused function call

d1180a8

Ensure all roots are Path objects

23c440f

Exclude dj_pipeline tests from online CI

5dfd4a4

Exclude dj_pipeline tests from coverage report

f557c48

Fix macOS wheel build for datajoint (Issue SainsburyWellcomeCentre#249

81bbfa1

) (SainsburyWellcomeCentre#406)

Run CI checks using pip env and pyproject.toml

a678b8d

Run code checks and tests on all platforms

2107691

Activate venv for later steps and remove all conda dependencies (Sain…

1de5c25

…sburyWellcomeCentre#413) * Persist venv across job steps * Update codecov-action version * Remove `env_config`

Merge pull request SainsburyWellcomeCentre#402 from SainsburyWellcome…

a889dba

…Centre/gl-ruff-check Clear all CI checks on all platforms

Ensure CI runs on push to main

2fb1058

Merge pull request SainsburyWellcomeCentre#414 from SainsburyWellcome…

7812b4f

…Centre/gl-ruff-check Ensure CI runs on push to main

lochhh reviewed Sep 24, 2024

View reviewed changes

ttngu207 added 2 commits September 27, 2024 07:34

chore(block_analysis): code cleanup (PR suggestions)

6b01ea3

Merge remote-tracking branch 'upstream/main' into datajoint_pipeline

61d173d

jkbhagatio approved these changes Sep 28, 2024

View reviewed changes

ttngu207 merged commit 07fe4c2 into SainsburyWellcomeCentre:datajoint_pipeline Sep 28, 2024

jkbhagatio mentioned this pull request Oct 10, 2024

Use Block Due Time from Environment_BlockState in ingestion #403

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update logic to retrieve threshold-specific pellet delivery timestamps #398

Update logic to retrieve threshold-specific pellet delivery timestamps #398

ttngu207 commented Aug 22, 2024 •

edited

Loading

jkbhagatio commented Sep 12, 2024

ttngu207 commented Sep 18, 2024

lochhh Sep 24, 2024

ttngu207 Sep 25, 2024

lochhh Sep 24, 2024

jkbhagatio commented Sep 25, 2024

ttngu207 commented Sep 27, 2024

jkbhagatio commented Sep 28, 2024

		@@ -188,37 +183,14 @@ def make(self, key):
		)
		patch_keys, patch_names = patch_query.fetch("KEY", "underground_feeder_name")

Update logic to retrieve threshold-specific pellet delivery timestamps #398

Update logic to retrieve threshold-specific pellet delivery timestamps #398

Conversation

ttngu207 commented Aug 22, 2024 • edited Loading

jkbhagatio commented Sep 12, 2024

ttngu207 commented Sep 18, 2024

lochhh Sep 24, 2024

Choose a reason for hiding this comment

ttngu207 Sep 25, 2024

Choose a reason for hiding this comment

lochhh Sep 24, 2024

Choose a reason for hiding this comment

jkbhagatio commented Sep 25, 2024

ttngu207 commented Sep 27, 2024

jkbhagatio commented Sep 28, 2024

ttngu207 commented Aug 22, 2024 •

edited

Loading