Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new cluster test API #12663

Draft
wants to merge 99 commits into
base: main
Choose a base branch
from

Conversation

CarinaFo
Copy link
Contributor

@CarinaFo CarinaFo commented Jun 14, 2024

set up new cluster_test api that sets up design matrix based on Wilkinson formula (using formulaic package)

  • currently unable to continue working on this project due to moving continent (soon based in Australia), estimated back to normal by end of November

Copy link
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few comments to get you rolling. Maybe ping us when the formulaic bit is added? (or sooner if you have any questions in the meantime, of course!)

tutorials/stats-sensor-space/76_new_cluster_test_api.py Outdated Show resolved Hide resolved
tutorials/stats-sensor-space/76_new_cluster_test_api.py Outdated Show resolved Hide resolved
tutorials/stats-sensor-space/76_new_cluster_test_api.py Outdated Show resolved Hide resolved
tutorials/stats-sensor-space/76_new_cluster_test_api.py Outdated Show resolved Hide resolved
tutorials/stats-sensor-space/76_new_cluster_test_api.py Outdated Show resolved Hide resolved
tutorials/stats-sensor-space/76_new_cluster_test_api.py Outdated Show resolved Hide resolved
@hoechenberger
Copy link
Member

This might be related:

mne-tools/mne-incubator#31

cc @SophieHerbst

@cbrnr
Copy link
Contributor

cbrnr commented Jun 19, 2024

I think it is a great idea to revamp the current cluster permutation test API 🚀! Could you please share the reasoning behind choosing a pandas DataFrame as the container for evokeds and related meta info? If possible, I'd try to avoid using pandas (an optional dependency) when something similar could be achieved using, for example, a simple dictionary.

@CarinaFo
Copy link
Contributor Author

@cbrnr I think the main reason for a dataframe instead of a dictionary is that formulaic only allows for dataframes as input and we want to include Wilkinson formula support in the new cluster_test API.

@hoechenberger
Copy link
Member

hoechenberger commented Jun 19, 2024

@cbrnr I think the main reason for a dataframe instead of a dictionary is that formulaic only allows for dataframes as input and we want to include Wilkinson formula support in the new cluster_test API.

But this doesn't need to be user-facing, then, no? Just trying to understand. If a user passes in a list of TypedDicts or Dataclasses, you can internally create the DataFrames that need to be passed to formulaic. The user would then also get tab-completion assistance in their editor. But it's just a thought. Great work so far in any case!

@drammock
Copy link
Member

you can internally create the DataFrames that need to be passed to formulaic

This would still require pandas to be available though.

@hoechenberger
Copy link
Member

you can internally create the DataFrames that need to be passed to formulaic

This would still require pandas to be available though.

Sure
But it would provide a potentially more user-friendly API

@drammock
Copy link
Member

drammock commented Jun 19, 2024

you can internally create the DataFrames that need to be passed to formulaic

This would still require pandas to be available though.

Sure But it would provide a potentially more user-friendly API

For context, this is step 1 of a GSoC project. A later step involves (probably) creating helper functions that will create the necessary DataFrame for the user. That's not done here because:

  1. there are probably complicated use cases / designs that we won't foresee, so we want it to be possible for the user to pass in their own custom dataframe instead of our internally-generated one.
  2. we're actually not sure how helpful the helper function will be (at least in some cases): if what you need to pass in is a set of matched lists of subject IDs, evoked objects, and condition names, well then you might as well just call DataFrame(dict(subj=subj_list, cond=cond_list, data=evk_list)) yourself instead of calling the helper function. A helper function makes much more sense in other cases, like when dealing with Epochs objects where the conditions are intermixed within one object.

@larsoner
Copy link
Member

@CarinaFo I pushed a commit to add the dataset and add formulaic to our full and doc dependencies so that eventually CIs can use them properly. I added the tags [skip azp] [skip actions] to skip running those CIs. Feel free to git pull the changes back to your local machine!

If you look at the CI runs, you can see that CircleCI, which builds modified examples/tutorials in PRs, hit an error up on 9c8ec90:

sphinx.errors.ExtensionError: Could not find docstring in file "/home/circleci/project/tutorials/stats-sensor-space/76_new_cluster_test_api.py". A docstring is required by sphinx-gallery unless the file is ignored by "ignore_pattern"

Then I pushed a little commit to fix that in 47363b5, and now it hits a different error (which replicates what I saw locally when I tried to run the example):

../tutorials/stats-sensor-space/76_new_cluster_test_api.py unexpectedly failed to execute correctly:

    Traceback (most recent call last):
      File "/home/circleci/project/tutorials/stats-sensor-space/76_new_cluster_test_api.py", line 449, in <module>
        df_long = convert_wide_to_long(df)
      File "/home/circleci/project/tutorials/stats-sensor-space/76_new_cluster_test_api.py", line 431, in convert_wide_to_long
        data_2d = row["data"]
      File "/home/circleci/python_env/lib/python3.10/site-packages/pandas/core/series.py", line 1121, in __getitem__
        return self._get_value(key)
      File "/home/circleci/python_env/lib/python3.10/site-packages/pandas/core/series.py", line 1237, in _get_value
        loc = self.index.get_loc(label)
      File "/home/circleci/python_env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
        raise KeyError(key) from err
    KeyError: 'data'

It would be good to keep CircleCI green at least so we can see renderings of the tutorial with each push. If you make sure python -i tutorials/stats-sensor-space/76_new_cluster_test_api.py runs cleanly locally each time before you push then it should work on CircleCI now as well!

@drammock drammock force-pushed the new_cluster_stats_api_GSOC24 branch from 81ce0d0 to 6322499 Compare August 22, 2024 20:21
@drammock
Copy link
Member

@CarinaFo FYI I needed to do a rebase and force-push in order to get the CIs to run again, as there was a merge conflict. You'll want to recreate your local branch. Ping me if you have questions / challenges (e.g. if you had local work you don't want to lose) and we can work through it together.

@CarinaFo CarinaFo changed the title added cluster test api, first commit added cluster test api Sep 17, 2024
@CarinaFo CarinaFo changed the title added cluster test api new cluster test API Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants