This repo aims to compile different datasets related to chess players’ ratings and rankings over time. The data are extracted from several sources:
-
since 1851 to September 2001 (annual, biannual, quarterly, monthly and hebdo snapshots): scraping chessmetrics old website created by Jeff Sonas. Rate calculation is chessmetrics. Output is stored as .csv in csv file.
-
since September 2001 to December 2004 (monthly snapshots): scraping chessmetrics new website created by Jeff Sonas. Rate calculation is chessmetrics. Output is stored as .csv in csv file.
-
since January 2001 to December 2019 (quarterly and monthly snapshots): fork from FIDE Data Pull created by Anuj Dahiya in 2022 and based on International Chess Federation rates (FIDE). Rate calculation is Elo rating system. File output compilations of chess players’s standard ratings in .csv is compiled as .parquet format compilationcsv.R file (output .parquet is bigger than 500Mo and not stored on git).
Selection of second dataframe in page list, adding date of list and ranking as 1, 2, 3, …, n from rating of each specific date. Example: in December 31, 1851, scraping dataframe from CSS selector:
body > font:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(2) > div:nth-child(5) > center:nth-child(1) > table:nth-child(4) > tbody:nth-child(1)
Output is like:
## # A tibble: 241,118 × 5
## Player Rating Age dateranking ranking
## <chr> <int> <dbl> <chr> <int>
## 1 Kasparov, Garry K 2884 37.0 April 10, 2000 1
## 2 Anand, Viswanathan 2796 30.3 April 10, 2000 2
## 3 Kramnik, Vladimir 2793 24.8 April 10, 2000 3
## 4 Shirov, Alexei 2778 27.8 April 10, 2000 4
## 5 Leko, Peter 2765 20.6 April 10, 2000 5
## 6 Topalov, Veselin 2746 25.1 April 10, 2000 6
## 7 Ivanchuk, Vassily 2738 31.1 April 10, 2000 7
## 8 Adams, Michael 2736 28.4 April 10, 2000 8
## 9 Gelfand, Boris 2731 31.8 April 10, 2000 9
## 10 Kamsky, Gata 2716 25.9 April 10, 2000 10
## # … with 241,108 more rows
Selection of dataframe in page list, adding date of list and ranking as 1, 2, 3, …, n from rating of each specific date. Example: in January 2001, scraping dataframe from CSS selector:
body > form:nth-child(1) > table:nth-child(4)
Output is like:
## # A tibble: 4,800 × 5
## Player Rating Age dateranking ranking
## <chr> <int> <chr> <int> <int>
## 1 Garry Kasparov 2850 37y9m 200101 1
## 2 Viswanathan Anand 2820 31y1m 200101 2
## 3 Vladimir Kramnik 2815 25y7m 200101 3
## 4 Peter Leko 2768 21y4m 200101 4
## 5 Alexander Morozevich 2757 23y6m 200101 5
## 6 Alexei Shirov 2750 28y6m 200101 6
## 7 Vassily Ivanchuk 2749 31y10m 200101 7
## 8 Michael Adams 2743 29y2m 200101 8
## 9 Evgeny Bareev 2739 34y2m 200101 9
## 10 Boris Gelfand 2738 32y7m 200101 10
## # … with 4,790 more rows