SeqDatasets is a Python package meant to provide an interface for downloading commonly used machine learning in genomics datasets. We currently support a small but growing list of datasets, including:
- The RNAComplete2013 dataset used to train DeepBind and ResidualBind
- DeepSTARR's genome-wide STARR-seq dataset that has been used in many other publications
Wouldn't it be nice if you could install one package and have access to all the datasets you've seen published out there. Well, that's what we're trying to do here and we could use some help! If you have a dataset that you've preprocessed for your work, share it with the community! It isn't too much work to get it XArray compatible and we're happy to help! Stay tuned for more details on how to contribute.
- Add citations to models mentioned above
- Add cleaned Basset dataset
- Add an example Basenji dataset
- Add an example BPNet dataset