Skip to content

Datasets for benchmarking, testing and developing in EUGENe

License

Notifications You must be signed in to change notification settings

ML4GLand/SeqDatasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SeqDatasets

SeqDatasets is a Python package meant to provide an interface for downloading commonly used machine learning in genomics datasets. We currently support a small but growing list of datasets, including:

  1. The RNAComplete2013 dataset used to train DeepBind and ResidualBind
  2. DeepSTARR's genome-wide STARR-seq dataset that has been used in many other publications

Help us add datasets!

Wouldn't it be nice if you could install one package and have access to all the datasets you've seen published out there. Well, that's what we're trying to do here and we could use some help! If you have a dataset that you've preprocessed for your work, share it with the community! It isn't too much work to get it XArray compatible and we're happy to help! Stay tuned for more details on how to contribute.

TODO

  • Add citations to models mentioned above
  • Add cleaned Basset dataset
  • Add an example Basenji dataset
  • Add an example BPNet dataset

Releases

No releases published

Packages

No packages published

Languages