Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring #74

Merged
merged 23 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
a82ac05
Add mapping sources file
dimkarakostas Jul 11, 2024
900cde3
Create get active source info helper functions
dimkarakostas Jul 11, 2024
b0d70a8
Move get_output_row, write_csv_output to helper
dimkarakostas Jul 11, 2024
364d454
Remove clustering flag and computed it based on config sources
dimkarakostas Jul 11, 2024
6eb27f7
Description comment in get_circulation_from_entries
dimkarakostas Jul 11, 2024
d2b2c5a
Add helper function to compute entity clusters when using multiple so…
dimkarakostas Jul 11, 2024
8ad0812
Update README
dimkarakostas Jul 11, 2024
f9ed264
Update documentation
dimkarakostas Jul 11, 2024
a7648b8
Disable exclude_below_usd_cent config flag by default
dimkarakostas Jul 11, 2024
92bf22e
Disable plot config flag by default
dimkarakostas Jul 11, 2024
b798d0b
Refactor mapping process
dimkarakostas Jul 11, 2024
781ebc3
Refactor analyzing process
dimkarakostas Jul 11, 2024
61bed0d
Refactor db_helper
dimkarakostas Jul 11, 2024
111e5de
Remove schema.py
dimkarakostas Jul 11, 2024
bf9302c
Remove old helper functions to get force map balances and analyze flag
dimkarakostas Jul 11, 2024
1eecdb2
Refactor tests
dimkarakostas Jul 11, 2024
318f71c
README typo
dimkarakostas Jul 15, 2024
084fbe4
Change tau computation to return only index
dimkarakostas Jul 16, 2024
ea347e7
Add helper function to get tau from param string
dimkarakostas Jul 16, 2024
bb86550
Exclude contract addrs from entries if flag is set
dimkarakostas Jul 16, 2024
860e7f1
Change entries object to list of ints instead of tuples
dimkarakostas Jul 16, 2024
7c6b092
Add test for excluding contracts flag
dimkarakostas Jul 16, 2024
e4033b5
Add small testcase
dimkarakostas Jul 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 31 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Currently, the supported blockchains are:
- Ethereum
- Litecoin
- Tezos

We intend to add more ledgers to this list in the future.

## Installation
Expand All @@ -28,44 +29,45 @@ project:

python -m pip install -r requirements.txt

### System requirements

Running the tool requires loading the raw input data on memory. To avoid running
out of memory, we recommend RAM at least double the largest raw data file.

### Mapping information

The mapping information for Cardano is too large for Github.
To retrieve it do the following:
- Download the file from
[here](https://uoe-my.sharepoint.com/:u:/g/personal/dkarakos_ed_ac_uk/EXseoT-v1xBHn1TWG1IvqHIB2L3Pm35-UtKIcUKmk1IQZw?e=YgTfjR&download=1).
- Move the file to the folder `mapping_information/addresses/`. Note that the file should be named `cardano.jsonl`.
- Move the file to the folder `mapping_information/addresses/`. The file _should be named_ `cardano.jsonl`.

## Run the tool

Place all raw data (which could be collected from [BigQuery](https://cloud.google.com/bigquery/) for example) in the `input` directory.
Each file named as `<project_name>_<snapshot_date>_raw_data.json` (e.g. `bitcoin_{2023-01-01}_raw_data.json`). By default, there
is a (very small) sample input file for some supported projects. To use the
samples, remove the prefix `sample_`. For more extended raw data and instructions on how to retrieve it, see
[here](https://blockchain-technology-lab.github.io/tokenomics-decentralization/data/).

Run `python run.py --ledgers <ledger_1> ... <ledger_n> --snapshots <date_1> <date_2>` to produce and analyze the database files.
For each ledger and for each snapshot one SQLite file is created, which contains the address mapping and the balance information.
Note that both arguments are optional, so it's possible to omit one or both of them (in which case the default values
will be used). Specifically:

- The `ledgers` argument accepts any number of supported ledgers (case-insensitive).
For example, `--ledgers bitcoin` runs the analysis for Bitcoin, `--ledgers Bitcoin Ethereum Cardano` runs the analysis
for Bitcoin, Ethereum and Cardano, etc. Ledgers with more words should be defined with an underscore; for example
Bitcoin Cash should be set as `bitcoin_cash`.
- The `snapshots` argument should be of the form `YYYY-MM-DD`.
For example, `--snapshots 2022-02-01` runs it for 1 February 2022.

`run.py` prints on stdout the output of each implemented metric for the specified ledgers and snapshot.

To mass produce and analyze data, omit one or both arguments. If some arguments
are omitted, the default values from `config.yaml` will be used. If only the
`ledgers` is given, all snapshots for which a raw data and/or database file exists will be
analyzed. If only the timeframe is specified, all ledgers will be analyzed for
the given timeframe (if the raw data and/or database files exist).

A single file `output.csv` is also created in the `output` directory, containing the output data from the
last execution of `run.py`.
Place all raw data (which could be collected from
[BigQuery](https://cloud.google.com/bigquery/) for example) in the `input`
directory. Each file named as `<project_name>_<snapshot_date>_raw_data.json`
(e.g. `bitcoin_2023-01-01_raw_data.json`). By default, there is a (very
small) sample input file for some supported projects. To use the samples, remove
the prefix `sample_`. For more extended raw data and instructions on how to
retrieve it, see [here](https://blockchain-technology-lab.github.io/tokenomics-decentralization/data/).

Edit the configuration file `config.yaml` to choose which ledgers to analyze,
for which snapshots, with which metrics, etc (see
[here](https://blockchain-technology-lab.github.io/tokenomics-decentralization/setup/)
for more information on each parameter).

Run `python run.py` to perform the analysis and print on stdout the output of
each implemented metric for the specified ledgers and snapshot.

For each ledger and for the chosen combination of mapping sources, a SQLite file
is created in `mapping_information/addresses`, which contains the address
mapping information.

A single file `output_{params}.csv` is also created in the `output` directory,
containing the output data from the last execution of `run.py`. Here, "params"
corresponds to analysis parameters like whether to apply clustering,
thresholding, etc.

## Contributing

Expand Down
11 changes: 6 additions & 5 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,18 @@ ledgers:
# Execution flags
execution_flags:
force_map_addresses: false
force_map_balances: false
force_analyze: false

# Analyze flags
analyze_flags:
clustering: true
clustering_sources:
- "Explorers"
- "Staking Keys"
- "Multi-input transactions"
top_limit_type: "absolute" # one of two types: "absolute" or "percentage"; if absolute then value should be integer; if percentage then value should be float in [0, 1]
top_limit_value: 0
exclude_contract_addresses: false
exclude_below_fees: false
exclude_below_usd_cent: true
exclude_below_usd_cent: false

# The snapshots for which an analysis should be performed.
# Each snapshot is a string of the form YYYY-MM-DD.
Expand All @@ -57,7 +58,7 @@ output_directories:

# Plot flags
plot_parameters:
plot: true
plot: false
ledgers:
- bitcoin
- bitcoin_cash
Expand Down
9 changes: 9 additions & 0 deletions docs/contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,15 @@ To contribute mapping information you can either update an existing file, by
changing and/or adding some entries, or create a new file for a newly-supported
ledger.

Note: If you add an entry in `mapping_addresses` with a source that does not
already exist, you should also add this source in the file
`mapping_information/sources.json`. Specifically, if it comes from a
publicly-available website you should add it under "Explorers", otherwise either
use an existing appropriate keyword or create a new one. If you create a new
one, make sure to also include it in the configuration file `config.yaml` and in
the description of the [Setup
page](https://blockchain-technology-lab.github.io/tokenomics-decentralization/setup/)).

### Price information

The directory `price_data/` contains information about the supported ledgers'
Expand Down
26 changes: 10 additions & 16 deletions docs/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,23 +35,19 @@ page](https://blockchain-technology-lab.github.io/tokenomics-decentralization/co
ledgers are included here (to add support for a new ledger see the [conributions
page](https://blockchain-technology-lab.github.io/tokenomics-decentralization/contribute/)).

`execution_flags` defines various flags that control the data handling (all set to false by default):
`execution_flags` defines flags that control the data handling (all set to false by default):

* `force_map_addresses`: if set to true, the address helper data from the directory
* `force_map_addresses`: if set to true, the address mapping data from the directory
`mapping_information` is re-computed; you should set this flag to true if the
data has been updated since the last execution for the given ledger
* `force_map_balances`: is set to true, the balance data of the ledger's addresses is
recomputed; you should set this flag to true if the data has been updated
since the last execution for the given ledger
* `force_analyze`: if set to true, the computation of a metric is recomputed; you should set
this flag to true if any type of data has been updated since the last
execution for the given ledger
mapping data has been updated since the last execution for the given ledger

`analyze_flags` defines various analysis-related flags:

* `clustering`: a boolean that determines whether addresses will be clustered into entities
(as defined in the mapping information). If set to false, no clustering takes
place and the addresses are treated as distinct entities.
* `clustering_sources`: a list of sources that should be used to compute the
address mapping information. If empty, no clustering takes place and the
addresses are treated as distinct entities. The list should contain any
combination of the following options (_case sensitive_): "Explorers", "Staking
Keys", "Multi-input transactions".
* `top_limit_type`: a string that can take one of two values (`absolute` or `percentage`) that
enables applying a threshold on the addresses that will be considered
* `top_limit_value`: the value of the top limit that should be applied; if 0,
Expand Down Expand Up @@ -84,9 +80,7 @@ define the source of data. `input_directories` defines the directories that
contain raw address balance information, as obtained from BigQuery or a full
node (for more information about this see the [data collection
page](https://blockchain-technology-lab.github.io/tokenomics-decentralization/data/)).
`output_directories` defines the directories to store the databases which
contain the mapping information and analyzed data. The first entry in the output
directories is also used to store the output files of the analysis and the
plots.
`output_directories` defines the directory to store the output files of the
analysis and the plots.

Finally, `plot_parameters` contains various parameters that control whether plots will be produced for the results and for which configurations.
16 changes: 16 additions & 0 deletions mapping_information/sources.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Explorers": [
"https://bitinfocharts.com",
"https://dogecoinwhalealert.com",
"https://www.walletexplorer.com",
"https://api.tzkt.io/",
"https://etherscan.io"
],
"Staking Keys": [
"staking key",
"payment key"
],
"Multi-input transactions": [
"multi-input"
]
}
3 changes: 1 addition & 2 deletions run.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@

def main(ledgers, snapshot_dates):
for ledger in ledgers:
for snapshot in snapshot_dates:
apply_mapping(ledger, snapshot)
apply_mapping(ledger)

analyze(ledgers, snapshot_dates)

Expand Down
Loading
Loading