Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement helper function to create design matrix? #134

Open
JohannesWiesner opened this issue Feb 20, 2023 · 1 comment
Open

Implement helper function to create design matrix? #134

JohannesWiesner opened this issue Feb 20, 2023 · 1 comment

Comments

@JohannesWiesner
Copy link

Hi! As an add-on to #127 , it might not only be nice just to obtain hierarchically resampled indices (so that users are independent from using one of the pre-defined tests provided by hierarch) but also to have helper function that allows you to create a design matrix that is needed as input to create the indices in the first place. For example, right now I am using

[...] FSL-Palm to derive exchangability blocks using the hcp2blocks.m function

This function takes over the tedious work of manually defining the EB-matrix yourself. For example consider the following dataframe - Some subjects in the HCP sample are completely unrelated to anyone (Family 2 & 5 only appear one time). Some subjects are related (having the same family id) but are regular sibilings. Some other subjects are also related, and on top they can be monozygotic or dizygotic twins. Here it get's a philosophical because MZ-twins can be considered as clones whereas for DZ-twins the hcp2blocks.m function allows the user to decide if DZ-twins should be treated as regular siblings or clones. In my case however, I also have repeated measures for each subject (which can not be interpreted by hcp2blocks.m). I fitted a mixed model (value ~ 1 + (1 | Subject)) and would like to obtain hierarchically resampled indices that respect the hierarchical structure of the HCP-dataset. Do you think it would make sense to implement such a function?

image

Here's the code to regenerate the dataframe:

df = pd.DataFrame( {'Subject': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3, 9: 4, 10: 4, 11: 4, 12: 5, 13: 5, 14: 5, 15: 6, 16: 6, 17: 6, 18: 7, 19: 7, 20: 7, 21: 8, 22: 8, 23: 8}, 'Family_ID': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3, 9: 4, 10: 4, 11: 4, 12: 1, 13: 1, 14: 1, 15: 4, 16: 4, 17: 4, 18: 5, 19: 5, 20: 5, 21: 3, 22: 3, 23: 3}, 'ZygosityGT': {0: 'MZ', 1: 'MZ', 2: 'MZ', 3: 'NoTwin', 4: 'NoTwin', 5: 'NoTwin', 6: 'NoTwin', 7: 'NoTwin', 8: 'NoTwin', 9: 'DZ', 10: 'DZ', 11: 'DZ', 12: 'MZ', 13: 'MZ', 14: 'MZ', 15: 'DZ', 16: 'DZ', 17: 'DZ', 18: 'NoTwin', 19: 'NoTwin', 20: 'NoTwin', 21: 'NoTwin', 22: 'NoTwin', 23: 'NoTwin'}, 'condition': {0: 1, 1: 2, 2: 3, 3: 1, 4: 2, 5: 3, 6: 1, 7: 2, 8: 3, 9: 1, 10: 2, 11: 3, 12: 1, 13: 2, 14: 3, 15: 1, 16: 2, 17: 3, 18: 1, 19: 2, 20: 3, 21: 1, 22: 2, 23: 3}, 'value': {0: 0.7739560485559633, 1: 0.4388784397520523, 2: 0.8585979199113825, 3: 0.6973680290593639, 4: 0.09417734788764953, 5: 0.9756223516367559, 6: 0.761139701990353, 7: 0.7860643052769538, 8: 0.12811363267554587, 9: 0.45038593789556713, 10: 0.37079802423258124, 11: 0.9267649888486018, 12: 0.6438651200806645, 13: 0.82276161327083, 14: 0.44341419882733113, 15: 0.2272387217847769, 16: 0.5545847870158348, 17: 0.06381725610417532, 18: 0.8276311719925821, 19: 0.6316643991220648, 20: 0.7580877400853738, 21: 0.35452596812986836, 22: 0.9706980243949033, 23: 0.8931211213221977}} )
@rishi-kulkarni
Copy link
Owner

rishi-kulkarni commented Feb 20, 2023

Indeed, I've looked into formulaic for doing this. I think there are two steps to implementing something like this - one difficult and one easy.

The easier step is dropping the redundant column that tracks measurement number when you have multiple obs per treated entity. I actually do have the code written for this in the development branch, which I can port over.

The harder step is making hierarch fully compatible with Wilkinson formulas. I've considered writing some sort of plugin to make something like this possible:
value ~ 1 + (1 | family) \ (1 | subject) \ treatment, where \ indicates nestedness. However, this throws away the ability to analyze crossed hierarchies.

The other option is going with traditional Wilkinson formulas and inferring the hierarchy, which is possible but also a challenge I'd have to give some thought to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants