Replies: 1 comment 5 replies
-
There is a short GWAS (i.e. OLS) simulation that could be scaled up or down easily in https://nbviewer.jupyter.org/gist/eric-czech/bdff3402d27b5cccd7d8aacab0957a93. Covariates are typically clinical/behavioral parameters (age, sex, etc.) and principal components intended to reflect population structure. Labels are typically a phenotype of some sort, e.g. disease status. It would be helpful to understand performance as sample and variant counts increase from 100 or 1000 or so. As high as you can go from there on a single machine would be great. I would also not do this using VCF files since the VCF IO will be hard to isolate in the running times. I think you are safe to simply generate data straight in Xarray like that notebook above. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
All reactions