Skip to content

Commit

Permalink
Synchronize README.md with README_PYPI.md, update examples
Browse files Browse the repository at this point in the history
  • Loading branch information
chernishev authored and polyntsov committed Dec 10, 2023
1 parent 220d9ef commit 9eef4e0
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 49 deletions.
36 changes: 11 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Desbordante is a high-performance data profiler that is capable of discovering a
* Conditional functional dependencies (discovery)
* Metric functional dependencies (validation)
* Fuzzy algebraic constraints (discovery)
* Unique column combinations (validation)
* Unique column combinations (discovery and validation)
* Association rules (discovery)

The discovered patterns can have many uses:
Expand Down Expand Up @@ -84,12 +84,10 @@ Simple usage examples:
```python
import desbordante

TABLE = '../examples/datasets/university_fd.csv'
TABLE = 'examples/datasets/university_fd.csv'

algo = desbordante.HyFD()
algo.set_option('table', (TABLE, ',', True))
algo.set_option('is_null_equal_null')
algo.load_data()
algo.load_data(TABLE, ',', True)
algo.execute()
result = algo.get_fds()
print('FDs:')
Expand All @@ -111,22 +109,16 @@ FDs:
```python
import desbordante

TABLE = '../examples/datasets/inventory_afd.csv'
TABLE = 'examples/datasets/inventory_afd.csv'
ERROR = 0.1

algo = desbordante.Pyro()
algo.set_option('table', (TABLE, ',', True))
algo.set_option('is_null_equal_null')
algo.load_data()
algo.set_option('error', ERROR)
algo.set_option('threads')
algo.set_option('max_lhs')
algo.set_option('seed')
algo.execute()
algo.load_data(TABLE, ',', True)
algo.execute(error=ERROR)
result = algo.get_fds()
print('AFDs:')
for fd in result:
print(fd)
print(fd)
```
```text
AFDs:
Expand All @@ -140,22 +132,16 @@ AFDs:
```python
import desbordante

TABLE = '../examples/datasets/theatres_mfd.csv'
TABLE = 'examples/datasets/theatres_mfd.csv'
METRIC = 'euclidean'
LHS_INDICES = [0]
RHS_INDICES = [2]
PARAMETER = 5

algo = desbordante.MetricVerifier()
algo.set_option('table', (TABLE, ',', True))
algo.set_option('is_null_equal_null')
algo.load_data()
algo.set_option('lhs_indices', LHS_INDICES)
algo.set_option('metric', METRIC)
algo.set_option('parameter', PARAMETER)
algo.set_option('dist_from_null_is_infinity')
algo.set_option('rhs_indices', RHS_INDICES)
algo.execute()
algo.load_data(TABLE, ',', True)
algo.execute(lhs_indices=LHS_INDICES, metric=METRIC,
parameter=PARAMETER, rhs_indices=RHS_INDICES)
if algo.mfd_holds():
print('MFD holds')
else:
Expand Down
34 changes: 10 additions & 24 deletions README_PYPI.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Try the web version at https://desbordante.unidata-platform.ru/
1. Functional dependencies, both exact and approximate (discovery and validation)
2. Metric functional dependencies (validation)
3. Fuzzy algebraic constraints (discovery)
4. Unique column combinations (validation)
4. Unique column combinations (discovery and validation)
5. Association rules (discovery)

This package uses the library of the Desbordante platform, which is written in C++. This means that depending on the
Expand All @@ -42,12 +42,10 @@ algorithm and dataset, the runtimes may be cut by 2-10 times compared to the alt
```python
import desbordante

TABLE = '../examples/datasets/university_fd.csv'
TABLE = 'examples/datasets/university_fd.csv'

algo = desbordante.HyFD()
algo.set_option('table', (TABLE, ',', True))
algo.set_option('is_null_equal_null')
algo.load_data()
algo.load_data(TABLE, ',', True)
algo.execute()
result = algo.get_fds()
print('FDs:')
Expand All @@ -72,18 +70,12 @@ FDs:
```python
import desbordante

TABLE = '../examples/datasets/inventory_afd.csv'
TABLE = 'examples/datasets/inventory_afd.csv'
ERROR = 0.1

algo = desbordante.Pyro()
algo.set_option('table', (TABLE, ',', True))
algo.set_option('is_null_equal_null')
algo.load_data()
algo.set_option('error', ERROR)
algo.set_option('threads')
algo.set_option('max_lhs')
algo.set_option('seed')
algo.execute()
algo.load_data(TABLE, ',', True)
algo.execute(error=ERROR)
result = algo.get_fds()
print('AFDs:')
for fd in result:
Expand All @@ -104,22 +96,16 @@ AFDs:
```python
import desbordante

TABLE = '../examples/datasets/theatres_mfd.csv'
TABLE = 'examples/datasets/theatres_mfd.csv'
METRIC = 'euclidean'
LHS_INDICES = [0]
RHS_INDICES = [2]
PARAMETER = 5

algo = desbordante.MetricVerifier()
algo.set_option('table', (TABLE, ',', True))
algo.set_option('is_null_equal_null')
algo.load_data()
algo.set_option('lhs_indices', LHS_INDICES)
algo.set_option('metric', METRIC)
algo.set_option('parameter', PARAMETER)
algo.set_option('dist_from_null_is_infinity')
algo.set_option('rhs_indices', RHS_INDICES)
algo.execute()
algo.load_data(TABLE, ',', True)
algo.execute(lhs_indices=LHS_INDICES, metric=METRIC,
parameter=PARAMETER, rhs_indices=RHS_INDICES)
if algo.mfd_holds():
print('MFD holds')
else:
Expand Down

0 comments on commit 9eef4e0

Please sign in to comment.