Data processing

Repository for ETL workflow that processes BAR data.

It periodically produces public dumps of the matches data combining information from teiserver and replays database. Check out Gallery section to see how community uses this data.

Data access

Data dumps are available as Parquet under:

https://data-marts.beyondallreason.dev/matches.parquet
https://data-marts.beyondallreason.dev/match_players.parquet
https://data-marts.beyondallreason.dev/players.parquet

and Compressed CSV file under:

https://data-marts.beyondallreason.dev/matches.csv.gz
https://data-marts.beyondallreason.dev/match_players.csv.gz
https://data-marts.beyondallreason.dev/players.csv.gz

More documentation is available at https://beyond-all-reason.github.io/data-processing/.

Usage examples

It's easy to load data into Jupyter Notebook or Google Colab, for example: plot the number of matches over time using Polars.

Given that datasets are available under URL, you can even use one of the Web UIs built on DuckDB-Wasm to run query entirely in the browser, for example: compute number of games per type per month

Gallery

Below we want to link some cool examples of how people in the community are using the data dumps. If you've created something please share with us on Discord or here in issues!

@Atlasfailed shared reports from his personal_skill_analysis project:
@Dazazzell created a live dashboard https://dazazzell.github.io/barmaps/ (source) that shows map popularity stats for the last 60 days.

Development

This project is using dbt for managing the SQL pipeline that transforms data and DuckDB as the query engine.

Initial

Setup:

python3 -m venv .venv
source .venv/bin/activate  # but I also recommend https://direnv.net/ that will load .envrc automatically
pip install -r requirements.txt

It's also recommented to install pre commit hooks that will check style of SQL code before making a commit

pre-commit install

Usage

data_source/dev contains a small sample of the full data sources used to genrate full dumps in prod, basic development and testing should be possible purely on this sample.

To build the data marts from this sample data:

dbt run

To run tests on the generated data (e.g. validate that fields are not null, or custom queries return expected results):

dbt test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data processing

Data access

Usage examples

Gallery

Development

Initial

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data processing

Data access

Usage examples

Gallery

Development

Initial

Usage