Skip to content

Commit

Permalink
Update project docs
Browse files Browse the repository at this point in the history
  • Loading branch information
p2004a committed Sep 11, 2024
1 parent 7207632 commit 0ead089
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 7 deletions.
12 changes: 5 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
# Data processing

**Work in proress.**

Repository for data transformation workflow that processes BAR data from
different sources and creates data dumps.
Repository for [ETL](https://en.wikipedia.org/wiki/Extract%2C_transform%2C_load)
workflow that processes BAR data from different sources and creates data dumps.

At the moment the only functionality is a public dump of the past matches data
combining information from teiserver and replays databases.

## Access dumps
## Data access

Automatically generated documentation about accessing dumps generates from this
repository is available at https://beyond-all-reason.github.io/data-processing/.
Documentation about accessing dumps generated with pipeling in this repository
is available at https://beyond-all-reason.github.io/data-processing/.

## Development

Expand Down
15 changes: 15 additions & 0 deletions models/homepage.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,21 @@ For example, to download [`matches`](#!/model/model.bar_data_processing.matches)
model (data table) in Parquet format, URL is:
https://data-marts.beyondallreason.dev/matches.parquet

## Data quality

The processing pipeline automatically removes some obviously very bad data (e.g.
not clear to which team player belongs to), but still due to various bugs in
teiserver, there are some issues with some small (low hundreds/tens) amount of
entries in the dataset, for example:

- Teiserver data reports different players vs teams assignments then intended
- Some player skill is reduced as if it lost even though it won

and likely plenty more.

If someting should be filtered out as bad data: feel free to open a new issue in
the repository.

## Schamas

To browse schema of all exported models (data tables) navigate on the left side
Expand Down

0 comments on commit 0ead089

Please sign in to comment.