Skip to content
/ artolearn Public

win/loss data and artosalt mining from artosis vods via vision and learning

Notifications You must be signed in to change notification settings

eqy/artolearn

Repository files navigation

artolearn

artolearn is a collection of scripts for (1) parsing Artosis VODs to extract csvs of game data (e.g., opponent rank, mmr, map, and game outcome), and (2) building and tuning machine learning models for online prediction of game outcomes (gamba gamba MODS OPEN CASINO).

Data Extraction

Game data is extracted via OpenCV, where still frames of games are matched against reference game scenes to determine the current stream VOD state (e.g., match-found (where MMRs are displayed), in-game, post-game (where outcomes are displayed), and so on. Once frames are matched, additional OCR is performed if necessary to extract information such as the current map, the MMRs of players, and so on. The "heavy lifting" is done by simple nearest-neighbor matching to reference images for scenes and a few other scenarios (e.g., determining the race selected by each player in the match screen). Additional heavy lifting is done by pytesseract for OCR, with some intermediate steps in OpenCV for thresholding and other operations to make the text more OCR-friendly.

The data extraction CLI is provided by video2csv.py. Extracted data csvs are in data/. Note that for copyright reasons, the reference scene images are omitted from this repository.

Modeling and Prediction

As the extracted data contains a mixture of numerical (player MMRs) and categorical features (map, races chosen by players, ...) the current modeling uses xgboost with optuna for hyperparameter tuning.

Prediction experiments are still at a very early and highly experimental stage, with many changes pending due to some tricky gotchas associated with a time-dependent classification task. For example, it may be useful to include time-dependent features such as the current stream uptime and win/loss streaks within a stream, but this introduces potential leaks of data labels. Similarly, it may be useful to limit the time horizon of included datapoints to account for MMR inflation or metagame shifts (e.g., by pruning very old game results from the dataset), but treating the time horizon as a hyperparameter may increase overfitting.

The tuning CLI is provided by csv2model.py. The live inference CLI is provided by infer.py.

Additional Fun Summary Statistics From Collected Data

Auto-updated 2022-03-07 21:27:44 (639 games)

per-race+rank winrates

rank vs. p vs. r vs. t vs. z overall
unranked 51.9% 33.3% 55.6% 40.9% 48.6%
a 48.1% 66.7% 48.6% 57.8% 52.1%
b 86.3% 100.0% 79.4% 84.6% 85.5%
c 100.0% 100.0% 100.0% 80.0% 92.3%
d 100.0% nan% nan% nan% 100.0%
s 33.3% 0.0% 26.1% 17.4% 26.5%
overall 55.3% 79.3% 53.3% 57.1% 56.5%

baseline accuracy from rank alone: 62.9%

baseline accuracy from race alone: 56.5%

race+rank baseline accuracy: 64.5%

baseline accuracy from always picking higher mmr player:66.1%

map/matchup winrates

map vs. p vs. r vs. t vs. z
eclipse 47.2% 75.0% 55.8% 64.0%
largo 67.3% 100.0% 46.4% 63.8%
polypoid 58.8% 66.7% 61.5% 57.1%
goodnight 50.0% 100.0% 42.3% 33.3%
revolver nan% nan% 0.0% 100.0%

About

win/loss data and artosalt mining from artosis vods via vision and learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published