This repository establishes simple statistics for a set of conferences.
Using the DBLP data set, we extract the top conferences and then aggregate them on per-author basis. Based on different sub groups (e.g., security, embedded systems, or OS) we then calculate per author statistics in a nice overview.
Processing happens in two stages:
parse_dblp.py
extracts all publications and dumps them in a pickle files based on the per-area aggregation (this is slow as DBLP is a 3GB XML file). To be able to process such a large XML file, we use a stream processor that simply dumps interesting publications intoPub
objects (seepubs.py
).top_authors.py
leverages the pickle files to process per-area statistics and aggregate statistics.author_cliques
leverages the pickle files to calculate per-area author- cliques.
- Easy mode: check out the homepage
make all
to download DBLP data, pickle, and create the html datamake fresh
to update DBLP data and pickle itmake topauthors
to create the top author pagesmake cliques
to create the cliques
Ideas, comments, or improvements are welcome! Please reach out to Mathias Payer to discuss. You can also reach out to @gannimo on Twitter.
- 2023-08-21 random bugfixes and conference updates
- 2023-02-06 adjusted SE/DB conferences based on feedback
- 2021-02-09 fixed VLDB conference and added ICDE and PODS for the database community; added ASE and ISSTA for the software engineering community
- 2021-01-11 added HPCA for architecture and adjusted paper length calculation for DAC
- 2021-01-09 remove tutorials and short papers (by parsing pages data)
- 2021-01-05 figures for overview page
- 2021-01-04 new overview table across areas
- 2021-01-02 added author cliques
- 2020-12-30 first version with author statistics
This code and page was developed by Mathias Payer, initially over the 2020 holiday break. The site includes feedback and suggestions from too many to list, thank you for that!
We use information from DBLP and CSRankings for anti-aliasing of authors. The idea for the statistics was inspired by Davide's Software Security Circus.
All data in this repository is licensed under CC BY-NC-ND 4.0.