-
Notifications
You must be signed in to change notification settings - Fork 13
/
TODO
152 lines (151 loc) · 8.37 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
== Simple ==
* ... fix error in wikipedia_link_*
** extr: regenerate all @intrawiki in wikipedia/
** re-execute @check for them
* http://law.dsi.unimi.it/datasets.php
** Datasets are only available in a custom binary format, and there is only a Java library to access it…
* the Wikipedia interwiki networks (all bipartite combinations of two languages). Is currently broken in extr/wikipedia/
* http://googleresearch.blogspot.de/2012/05/from-words-to-concepts-and-back.html
* http://law.dsi.unimi.it/datasets.php
* The Erdős collaboration graph (does not seem to be available)
* http://networkdata.ics.uci.edu/index.php
* http://www.nas.ewi.tudelft.nl/index.php/research/127
* http://amici.dsi.unifi.it/lasagne/?page_id=19
** The "ydata" and "lasagne" is from there; the codes may have to be merged
* http://wiki.gephi.org/index.php/Datasets
* http://opr.princeton.edu/archive/P90/
* http://w3.usf.edu/FreeAssociation/
* http://crawdad.cs.dartmouth.edu/meta.php?name=mit/reality
* http://wiki.gephi.org/index.php/Datasets (write a parsing script for gephi files)
* http://www.cs.toronto.edu/~tsap/experiments/download/download.html
* http://masonporter.blogspot.de/2011/02/facebook100-data-set.html
* http://netsg.cs.sfu.ca/youtubedata/
* http://downloads.cloudmade.com/
* http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1001109#s2
* http://babelnet.org/
* http://www.cs.umd.edu/projects/linqs/projects/lbc/index.html : cora citation network
* http://kevinchai.net/datasets
* http://www.kde.cs.uni-kassel.de/ws/dc13/offline/
* http://arcane-coast-3553.herokuapp.com/overview
* http://ebiquity.umbc.edu/resource/html/id/82/
* http://www.infochimps.com/datasets/marvel-universe-social-graph
* https://sites.google.com/site/cxnets/data
* http://netwiki.amath.unc.edu/SharedData/SharedData
* http://www.zubiaga.org/resources/socialbm0311 (3e8 Delicious tag assignments, via Arkaitz Zubiaga)
* http://www.sogou.com/labs/dl/q.html
* http://imat-relpred.yandex.ru/en/datasets
* country adjacencies (also: historical country adjacencies)
* regenerate the datasets with the tag #regenerate
* http://kindred.stanford.edu/?utm_source=buffer&utm_campaign=Buffer&utm_content=buffer2a907&utm_medium=twitter#
* http://www.caida.org/data/overview/
* http://www.routeviews.org/
* http://www.yeastnet.org/
* http://math.nist.gov/~RPozo/complex_datasets.html
* http://vlado.fmf.uni-lj.si/pub/networks/data/bio/Yeast/Yeast.htm
* http://proj.ise.bgu.ac.il/sns/datasets.html
* https://github.com/sidooms/MovieTweetings
* http://www.whosampled.com/
* https://developers.google.com/freebase/data
* all billion triple challenges (BTC)
* https://networkdata.ics.uci.edu/
* http://www.lemurproject.org/clueweb09.php/
* http://www.lemurproject.org/clueweb12.php/
* The Linked Open data cloud as a network
* http://www.stats.ox.ac.uk/~snijders/siena/
* http://www.boardsandgender.com/data.php
* http://2015.recsyschallenge.com/challenge.html
* find a football results network
** only easy to extract
** if possible, all Länderspiele from FIFA, with dates and scores
* http://www.cbl.umces.edu/~ulan/ntwk/network.html
* https://www.nceas.ucsb.edu/interactionweb/resources.html
* https://knb.ecoinformatics.org/knb/metacat?action=read&qformat=nceas&sessionid=&docid=bowdish.272
* http://datadryad.org/resource/doi:10.5061/dryad.c213h
* http://www.casos.cs.cmu.edu/computational_tools/datasets/sets/foodweb/
* http://yahoolabs.tumblr.com/post/137281912191/yahoo-releases-the-largest-ever-machine-learning
* index translationum
* which language wiktionary contains how many words of another language
* a connectome dataset: http://cmtk.org/viewer/datasets/
* http://proj.ise.bgu.ac.il/sns/datasets.html
* http://cnets.indiana.edu/groups/nan/webtraffic/websci14-data/
* http://vlado.fmf.uni-lj.si/pub/networks/data/
* more from http://www.bgu.ac.il/~bargera/tntp/
* http://www.correlatesofwar.org/data-sets/direct-contiguity/direct-contiguity-v3-1
* which country borders which country (same with subnational entities)
* http://icwsm.cs.mcgill.ca/
* https://ls11-www.cs.uni-dortmund.de/staff/morris/graphkerneldatasets
* Dear Jérôme,
I am very glad to hear from you! I wish to contribution a
co-authorship network of 402.39K authors on Google Scholar
(https://github.com/chenyang03/co-authorship-network). The
corresponding paper is our work "Building and Analyzing a Global
Co-Authorship Network Using Google Scholar Data" in BigScholar'17
(http://user.informatik.uni-goettingen.de/%7Eychen/papers/GoogleScholar_BigScholar17.pdf).
Thanks so much!
Best regards,
Yang
* https://aminer.org/data (accessed by Jun)
* https://journals.aps.org/datasets (needs access)
* London tube paths (cf. Ingo scholtes)
* http://connectivity.brain-map.org
* http://www.cp.jku.at/datasets/LFM-1b/
* http://www-levich.engr.ccny.cuny.edu/webpage/hmakse/software-and-data/
* for everything from wikipedia, extract it for *all* languages. This will increase the number of networks in KONECT accordingly.
* Add missing datasets from SNAP. Even multiple versions of the same actual dataset.
** http://snap.stanford.edu/data/wiki-RfA.html : has negative edges
* https://github.com/gephi/gephi/wiki/Datasets
* http://networkrepository.com/
* http://snap.stanford.edu/temporal-motifs/data.html
* http://www.bgu.ac.il/~bargera/tntp/Austin/Austin_net.txt
* DIMACS challenge: all others (apart from number 9)
* snap: com-* datasets
* citeseerx dataset or http://www.hpi.uni-potsdam.de/naumann/projekte/repeatability/datasets/cora_dataset.html / http://citeseerx.ist.psu.edu/about/metadata
* Boeing 767 structural network
* a genealogical network
* https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream
* https://sites.google.com/site/ucinetsoftware/datasets/covert-networks
* http://vladowiki.fmf.uni-lj.si/doku.php?id=pajek:data:pajek:vlado
* http://vladowiki.fmf.uni-lj.si/doku.php?id=pajek:data:urls:index
== Needs complex extraction or research ==
* http://openplanb.tumblr.com/
* dbpedia
** the whole file layout and URLs have changed. Update the code.
** influences and influenced by
** more DBpedia: (associatedMusicalArtist+associatedBand), currentMember, artist, formerTeam (add it to current teams), language.
* new Petster datasets
* http://webdatacommons.org/hyperlinkgraph/ – This is the largest graph i've ever seen
** makes only sense when we have the cloud
** there are many smaller datasets in there
** the largest dataset would take about 1 TiB pf RAM (in SG1 format)
* slashdot-west: estrella:/data/kunegis/konect/dat/slashdot-west/slashdot_signed-graph_html_after-phase-2.tar.gz
** This is a newer Slashdot signed social network given to us by Robert West from Stanford. Julia may be using it for her [[Julia Preusse/Diss|thesis]].
* Category tree from Wikipedia
* wordnet antonym
* http://www.wikipathways.org/index.php/Download_Pathways
* political interactions from Brandes et al. http://www.inf.uni-konstanz.de/algo/research/conflict/#downloads
* http://www.zubiaga.org/resources/socialbm0311/ (easy big very large, and we have almost the same already)
* Wikipedia article×term bipartite graph for all languages
* http://build.kiva.org/
* hudine.neu.edu (does not respond) -- disease information
* the wikipedia hyperlink graph and category tree are available in an easy to parse format on download.wikimedia.org
* DBLP: build the author-conference bipartite network with timestamps
* multilingual dbpedia
* what wikipedia user speaks what language (or commons user, etc.)
* http://webscope.sandbox.yahoo.com/
* http://anonymity-in-bitcoin.blogspot.com/2011/09/code-datasets-and-spsn11.html
* https://www.kaggle.com/c/kddcup2012-track1#description
* http://www.dfki.uni-kl.de/~obradovic/download/blogspider.zip : needs database
* http://data.stackexchange.com/ : only via API
* IMDB full semantic network: movie, actor, director, etc.
* GeoWordNet
* YAGO
* corpora from WWW paper [336]
* movielens tags (from the million dataset)
* Orkut user-community network [Collaborative Filtering for Orkut Communities: Discovery of User Latent Behavior, Chen &al]
* [486] The Dynamics of Web-based Social Networks: Membership, Relationships, and Change, Jennifer Golbeck; lots of adjacency lists.
* [490] Mining Graph Evolution Rules, Michele Berlingerio, Francesco Bonchi, Björn Bringmann, and Aristides Gionis.
* BibSonomy: who follows whom
* Pavel Braslavski: MoiKrug social network: Anton Volnuhin [[email protected]]
* [620] cyworld
* http://storage.googleapis.com/books/ngrams/books/datasetsv2.html
* money transfer dataset (skew symmetric network)