Design principles for a new GWAS Toolkit #672
Replies: 6 comments
-
Some questions to answer
|
Beta Was this translation helpful? Give feedback.
-
Some optimizations that would be nice
|
Beta Was this translation helpful? Give feedback.
-
(Post by @jeromekelleher)
I would say "all of them". Anything that we specialise for haploids, diploids, haplodiploids, etc will end up being limiting and short-sighted. This should be a general-purpose genomics library (that works well in the common diploid case).
I would say "no" as it seems like mission creep, but then I don't do any single cell so I don't know. |
Beta Was this translation helpful? Give feedback.
-
(Post by @alimanfoo)
FWIW I previously tried to avoid anything sub-byte in scikit-allel, mainly because numpy is byte-atomic. Also bit-shifting is not something I find easy to code, I know it's bread and butter to lots of folks coming from a more computational background, but can be less natural to hackers coming from a more biological background (like me :-)). Also I found that whole-byte representations with compression were as compact or more so than sub-byte representations, so using sub-byte representations just to save space doesn't always make sense. But these are small points, I don't want to shut any ideas down here, just some thoughts. |
Beta Was this translation helpful? Give feedback.
-
A discussion on Dask configuration happening at Related Sciences right now between @eczech and @ravwojdyla has reminded me of another design principle
As noted in https://discourse.pystatgen.org/t/a-comparative-evaluation-of-systems-for-scalable-linear-algebra-based-analytics-2018/15, bad default configuration parameters and lots of workload-specific tuning are big drawbacks of Spark. In particular, we should have default configuration parameters that work well on a single node, as @ravwojdyla has started to collect at related-sciences/gwas-analysis#27. |
Beta Was this translation helpful? Give feedback.
-
(Post by @ravwojdyla) If I may add some thoughts here, from working on some other processing frameworks, things that I find useful:
|
Beta Was this translation helpful? Give feedback.
-
I figured I'd start a topic to collect the design principles that have been implicit in our discussions of what we'd like to build.
Some initial thoughts:
Beta Was this translation helpful? Give feedback.
All reactions