Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix eager dtype conversion of value column #1353

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

drammock
Copy link
Member

Follow-up to #1349, see https://github.com/mne-tools/mne-bids/pull/1349/files#r1885104885

PR Description

Basically: use defensive programming when trying to dtype-convert the value column from events.tsv. I was a bit worried that a column called value could easily exist in real datasets and not conform to the way that column gets written by MNE-BIDS, and it didn't take long for @larsoner to find one :)

The bug was already caught by a downstream test in MNE-BIDS-Pipeline, so I didn't bother adding a new test here, but LMK if you want me to add a simple one.

Merge checklist

Maintainer, please confirm the following before merging.
If applicable:

  • All comments are resolved
  • This is not your own PR
  • All CIs are happy
  • PR title starts with [MRG]
  • whats_new.rst is updated
  • New contributors have been added to CITATION.cff
  • PR description includes phrase "closes <#issue-number>"

@drammock
Copy link
Member Author

circleCI failure is a bit mysterious (failure to cross-ref pathlib.Path in an attribute docstring). I thought it was sphinx-doc/sphinx#12317 but CircleCI here is running Sphinx 8.1.3 so that fix should definitely be in place. I'll try to dig deeper tomorrow. Also haven't looked into the Py3.10 build failure yet, beyond noting that it's a metadata-missing on twine --check error. anyone else should feel free to investigate if you have a debugging craving and a few minutes.

@drammock
Copy link
Member Author

OK, the cross-ref failure looks like sphinx-doc/sphinx#13178; autodoc is the cause but so far I can't really understand the root of the problem. Probably we just need to add a warning ignore :(

sadly I can't replicate the twine check --strict dist/* failure locally using a py3.10 environment :(

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 60.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 97.39%. Comparing base (46f284b) to head (9b395b4).

Files with missing lines Patch % Lines
mne_bids/read.py 60.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1353      +/-   ##
==========================================
- Coverage   97.43%   97.39%   -0.05%     
==========================================
  Files          40       40              
  Lines        8903     8912       +9     
==========================================
+ Hits         8675     8680       +5     
- Misses        228      232       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@drammock
Copy link
Member Author

ok was able to reproduce the twine failure, and found a workaround here: astral-sh/rye#1446 (comment)

TL;DR some kind of mismatch between twine and hatchling, related to metadata versions.
Installing twine>=6 didn't fix it though, whereas pinning hatchling==1.26.3 did (latest is 1.27.0). TBH IDK what the problem is, both name = "mne-bids" and dynamic = ["version"] are in pyproject.toml 🤷🏻

@hoechenberger
Copy link
Member

Thanks for looking into this, @drammock

@scott-huberty
Copy link
Contributor

scott-huberty commented Dec 22, 2024

Hi @drammock

To understand the original issue I actually had to create a test myself that would be caught by the try-catches added in this PR, so maybe it is indeed worth adding a test?

Something like this to mne_bids/tests/test_read/test_handle_events_reading:

# Test with only a `value` column, but where the values in this `value` column don't
# meet our assumption that they can be coerced to integers/floats.
events = {
        "onset": [10, 15],
        "duration": [1, 1],
        "value": ["Condition 1", "Condition 2"],
}
events_fname = tmp_path / "bids6" / "sub-01_task-test_events.tsv"
events_fname.parent.mkdir()
_to_tsv(events, events_fname)

raw, event_id = _handle_events_reading(events_fname, raw)
ev_arr, ev_dict = mne.events_from_annotations(raw, event_id=event_id)

# EDIT: test below will pass after https://github.com/mne-tools/mne-bids/pull/1353#discussion_r1894777877

# In this edge case, there were no "event IDs" in the `value` column of the
# events.tsv, so we created event IDs based on the row index of those values.
# And The values of this `value` column were used as annotation descripitons.
assert {"Condition 1": 0, "Condition 2": 1} == event_id == ev_dict
assert raw.annotations.description.tolist() == ["Condition 1", "Condition 2"]

This test creates an events TSV file like:

onset	duration	value
10	1	Condition 1
15	1	Condition 2

Where _, event_id = _handle_events_reading(events_fname, raw) returns

{'Condition 1': 'Condition 1', 'Condition 2': 'Condition 2'}

And AFAICT, from the User's perspective:

  1. This event_id would only get returned to the user if they do read_raw_bids(..., return_event_dict=True)
  2. The condition 1 and condition 2 events are assigned to annotations in the raw object.

EDIT: I updated the proposed test after my suggestions in

@scott-huberty
Copy link
Contributor

On second thought...

{'Condition 1': 'Condition 1', 'Condition 2': 'Condition 2'} is strange.

In this case, maybe we should either

  • return None (and warn the user)
  • or use np.arange to create a "proper" event dict.

try:
culled_vals = np.asarray(culled_vals, dtype=float)
except ValueError:
pass
Copy link
Contributor

@scott-huberty scott-huberty Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pass
# If the values are not numeric, just use the row index as the event ID
culled_vals = np.arange(len(culled_vals))

@drammock drammock mentioned this pull request Dec 23, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants