forked from numenta/nupic-legacy
-
Notifications
You must be signed in to change notification settings - Fork 0
Data Sets for NuPIC
Michael edited this page Feb 10, 2014
·
8 revisions
- http://www.quandl.com
- http://infochimps.org/datasets
- http://junar.com/ (also a dataset search engine)
- http://archive.ics.uci.edu/ml/
- http://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1
- http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets
- https://datamarket.azure.com/
- http://www.data.gov/
- http://www.data.gc.ca
- http://www.factual.com/
- http://www.google.com/publicdata/directory
- http://www.guardian.co.uk/news/datablog
- http://www.kaggle.com/
- http://www.nationalarchives.gov.uk/records/catalogues-and-online-records.htm
- http://robjhyndman.com/TSDL/ (Time series datasets)
- http://ckan.net/
- http://www.delicious.com/pskomoroch/redistributable+dataset
- http://www.reddit.com/r/datasets/
- http://www.reddit.com/r/opendata
- http://www.trustlet.org/wiki/Repositories_of_datasets
- http://www.diggingintodata.org/Repositories/tabid/167/Default.aspx
- http://oad.simmons.edu/oadwiki/Data_repositories
- http://data-ac-uk.ecs.soton.ac.uk/
- http://crawdad.org/ (Wireless data and traces)
- http://data.cityofchicago.org/ (Chicago government data)
- http://data.govloop.com/ (National government data)
- http://data.gov.uk/data (UK government data)
- http://data.medicare.gov/
- http://data.seattle.gov/
- http://data.sunlightlabs.com/ (Whitehouse and Congressional data)
- http://gettingpastgo.socrata.com/ (Education and educational accountability)
- http://snap.stanford.edu/data/index.html (Social Network Datasets)
- http://timetric.com/public-data/ (Economic datasets)
- http://www.bls.gov/ (Labor statistics)
- http://www.dartmouthatlas.org/tools/downloads.aspx (Health care data)
- http://www.datakc.org/ (King County, Washington data)
- http://research.stlouisfed.org/fred2/ (Economic data)
- http://www.who.int/research/en/ (World Health Data)
- http://www.datasf.org/ (San Francisco Data)
- http://www.nyc.gov/html/datamine/html/home/home.shtml (New York City Data)
- http://data.vancouver.ca (Vancouver Data)
- http://stats.oecd.org/index.aspx (Organization for Economic Cooperation and Development)
- http://data.un.org/Explorer.aspx (UN Data)
- http://www.ngdc.noaa.gov/ngdc.html (Geophysical Data)
- http://data.worldbank.org (World Bank)
- https://wist.echo.nasa.gov/~wist/api/imswelcome/ (Earth Science Data)
- http://www.stats.com/ (Sports Data)
- http://www.physionet.org/cgi-bin/atm/ATM (annotated, digitized physiologic signals and time series)
- http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1 (The TIMIT dataset as mentioned by Andrew Ng in Unsupervised Feature Learning and Deep Learning. Note: available for purchase only; no free download. :( )
- http://www.ncbi.nlm.nih.gov/genome/ (Genomic Data)
- http://noticeboard.americascup.com/race-data/data-api-intro/ (America's Cup)
- http://www.frixo.com/ (traffic data)
- http://en.wikipedia.org/wiki/Wikipedia:Database_download
- http://factfinder.census.gov/servlet/DatasetMainPageServlet (The US Census)
- http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13 (Full Google N-Gram dataset for English $150)
- Viewer for above: http://ngrams.googlelabs.com/
- http://www2.jpl.nasa.gov/srtm (Elevation dataset)
- http://build.kiva.org/ (Stats from Kiva.org - Available through API)
- http://www.imdb.com/interfaces
- http://www.retrosheet.org/game.htm (Baseball)
- http://www.basketball-reference.com/ (Basketball)
- http://db.humanconnectome.org/ (Neuroimaging)
- Pachube - http://www.pachube.com/ - Store, share & discover realtime sensor, energy and environment data from objects, devices & buildings around the world.
- Gnip - http://www.gnip.com/ - Social media API
- Factual - http://www.factual.com/ - Factual has constantly evolving data on thousands of topics
-
https://groups.google.com/forum/#!forum/get-theinfo HN - '[T]he best way to find data sets. They are a bunch of data hoarders who can help you'
-
http://opendata.stackexchange.com/ - StackExchange site to find data sets
How info chimp uses Hadoop and the cloud
Amazon Machine Image for data processing (could be very useful)
- http://ebiquity.umbc.edu/blogger/2009/02/14/infochimps-amazon-machine-image-for-data-analysis-and-viz/ Kate's Data Presentation.ppt