artolearn is a collection of scripts for (1) parsing Artosis VODs to extract csvs of game data (e.g., opponent rank, mmr, map, and game outcome), and (2) building and tuning machine learning models for online prediction of game outcomes (gamba gamba MODS OPEN CASINO).
Game data is extracted via OpenCV, where still frames of games are matched against reference game scenes to determine the current stream VOD state (e.g., match-found (where MMRs are displayed), in-game, post-game (where outcomes are displayed), and so on. Once frames are matched, additional OCR is performed if necessary to extract information such as the current map, the MMRs of players, and so on. The "heavy lifting" is done by simple nearest-neighbor matching to reference images for scenes and a few other scenarios (e.g., determining the race selected by each player in the match screen). Additional heavy lifting is done by pytesseract for OCR, with some intermediate steps in OpenCV for thresholding and other operations to make the text more OCR-friendly.
The data extraction CLI is provided by video2csv.py
.
Extracted data csvs are in data/
.
Note that for copyright reasons, the reference scene images are omitted from
this repository.
As the extracted data contains a mixture of numerical (player MMRs) and categorical features (map, races chosen by players, ...) the current modeling uses xgboost with optuna for hyperparameter tuning.
Prediction experiments are still at a very early and highly experimental stage, with many changes pending due to some tricky gotchas associated with a time-dependent classification task. For example, it may be useful to include time-dependent features such as the current stream uptime and win/loss streaks within a stream, but this introduces potential leaks of data labels. Similarly, it may be useful to limit the time horizon of included datapoints to account for MMR inflation or metagame shifts (e.g., by pruning very old game results from the dataset), but treating the time horizon as a hyperparameter may increase overfitting.
The tuning CLI is provided by csv2model.py
.
The live inference CLI is provided by infer.py
.
Auto-updated 2022-03-07 21:27:44 (639 games)
rank | vs. p | vs. r | vs. t | vs. z | overall |
---|---|---|---|---|---|
unranked | 51.9% | 33.3% | 55.6% | 40.9% | 48.6% |
a | 48.1% | 66.7% | 48.6% | 57.8% | 52.1% |
b | 86.3% | 100.0% | 79.4% | 84.6% | 85.5% |
c | 100.0% | 100.0% | 100.0% | 80.0% | 92.3% |
d | 100.0% | nan% | nan% | nan% | 100.0% |
s | 33.3% | 0.0% | 26.1% | 17.4% | 26.5% |
overall | 55.3% | 79.3% | 53.3% | 57.1% | 56.5% |
baseline accuracy from rank alone: 62.9%
baseline accuracy from race alone: 56.5%
race+rank baseline accuracy: 64.5%
baseline accuracy from always picking higher mmr player:66.1%
map | vs. p | vs. r | vs. t | vs. z |
---|---|---|---|---|
eclipse | 47.2% | 75.0% | 55.8% | 64.0% |
largo | 67.3% | 100.0% | 46.4% | 63.8% |
polypoid | 58.8% | 66.7% | 61.5% | 57.1% |
goodnight | 50.0% | 100.0% | 42.3% | 33.3% |
revolver | nan% | nan% | 0.0% | 100.0% |