Skip to content

Commit

Permalink
Merge pull request #1 from kishorek/develop
Browse files Browse the repository at this point in the history
updated version, documentation & some fixes
  • Loading branch information
kishorek authored Jun 22, 2020
2 parents 657fed8 + 8d11695 commit 7247bb1
Show file tree
Hide file tree
Showing 5 changed files with 138 additions and 15 deletions.
133 changes: 133 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,139 @@ optional arguments:
-s, --save-output Save output to file
```

## Examples

### Scenario 1: Simple direct comparison

|id |first |last |age|
|---|--------|--------|---|
|432|Roy |Aguilar |46 |
|914|Janie |Bowman |24 |
|021|Grace |Copeland|53 |
|708|Louise |Franklin|25 |
|850|Gertrude|Carr |60 |

vs

|id |first |last |age|
|---|--------|--------|---|
|432|Roy |Aguilar |46 |
|914|Janie |Bowman |24 |
|021|Grace |Copeland|53 |
|708|Louise |Franklin|25 |
|850|Gertrude|Carr |60 |

```console
comparesv file1 file2
```

Will provide:

|S.No|id |first |last|age |
|----|--------|--------|----|----|
|1 |True |True |True|True|
|2 |True |True |True|True|
|3 |True |True |True|True|
|4 |True |True |True|True|
|5 |True |True |True|True|

and

|S.No|id |first |last|age |
|----|--------|--------|----|----|
|1 |[432]:[432]|[Roy]:[Roy]|[Aguilar]:[Aguilar]|[46]:[46]|
|2 |[914]:[914]|[Janie]:[Janie]|[Bowman]:[Bowman]|[24]:[24]|
|3 |[021]:[021]|[Grace]:[Grace]|[Copeland]:[Copeland]|[53]:[53]|
|4 |[708]:[708]|[Louise]:[Louise]|[Franklin]:[Franklin]|[25]:[25]|
|5 |[850]:[850]|[Gertrude]:[Gertrude]|[Carr]:[Carr]|[60]:[60]|

---
### Scenario 2: Fuzzy column names

|id |first |last |age of student|
|---|--------|--------|--------------|
|432|Roy |Aguilar |46 |
|914|Janie |Bowman |24 |

and

|id |first |last |age|
|---|--------|--------|---|
|432|Roy |Aguilar |46 |
|914|Janie |Bowman |24 |

```console
comparesv file1.csv file2.csv --column-match 'fuzzy'
```

will provide
|S.No|id |first |last|age |
|----|--------|--------|----|----|
|1 |True |True |True|True|
|2 |True |True |True|True|
---
### Scenario 3: Fuzzy row order - Differnt ordered textual data

|id |first |last |age|
|---|--------|--------|---|
|432|Roy |Aguilar |46 |
|914|Janie |Bowman |24 |
|021|Grace |Copeland|53 |

and

|id |first |last |age of student|
|---|--------|--------|--------------|
|021|Grace |Copeland|53 |
|432|Roy |Aguilar |46 |
|914|Janie |Bowman |24 |

```console
comparesv file1.csv file2.csv --column-match 'fuzzy' --row-match 'fuzzy'
```
will provide

|S.No|id |first |last|age |
|----|--------|--------|----|----|
|1 |True |True |True|True|
|2 |True |True |True|True|
|3 |True |True |True|True|
---
### Scenario 3: Deep row order - Different ordered numerical data

|year1|year2 |year3 |year|
|-----|--------|--------|----|
|751 |609 |590 |930 |
|417 |501 |441 |763 |
|691 |621 |941 |563 |
|179 |781 |335 |225 |
|961 |530 |433 |571 |

and

|year1|year2 |year3 |year|
|-----|--------|--------|----|
|961 |530 |433 |571 |
|751 |609 |590 |930 |
|691 |621 |941 |563 |
|179 |781 |335 |225 |
|417 |501 |441 |763 |

```console
comparesv file1.csv file2.csv --row-match 'deep'
```

|S.No|year1 |year2 |year3|year|
|----|--------|--------|-----|----|
|1 |True |True |True |True|
|2 |True |True |True |True|
|3 |True |True |True |True|
|4 |True |True |True |True|
|5 |True |True |True |True|

---
### Scenario n: Unlimited options. Please explore the options below
---
## Description

The first file is considered as the source file. It will be compared against the second file. Refer the below options to finetune the way it works.
Expand Down
4 changes: 1 addition & 3 deletions cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ def main():
sys.stderr.write('Starting up...\n')
try:
file1, file2, args = arguments()
print(args)
data1, headers1 = read(*file1)
data2, headers2 = read(*file2)
results = comparesv.run(data1, headers1, data2, headers2, ticker=ticker, **args)
Expand Down Expand Up @@ -113,5 +112,4 @@ def format(results, keys):


if __name__ == '__main__':
main()

main()
11 changes: 1 addition & 10 deletions comparesv.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,13 +292,4 @@ def predict_column_type(data):
elif int in data_types:
return "int"
else:
return "str"

h1 = ["id", "age"]
h2 = ["id", "age","gender"]
d1 = [["A1", 23], ["A2", 24], ["A3", 34]]
d2 = [["A1", 23,"M"], ["A2", 24,"F"], ["A3", 34,"O"]]

output = run(d1, h1, d2, h2, include_addnl_columns='fuzzy')
from pprint import pprint
pprint(output)
return "str"
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ def open_file(fname):
'tqdm==4.18.0',
'unidecode==1.1.1',
'doublemetaphone==0.1',
'fuzzywuzzy==0.18.0'
'fuzzywuzzy==0.18.0',
'python-Levenshtein==0.12.0'
],
entry_points={
'console_scripts': [
Expand Down
2 changes: 1 addition & 1 deletion version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = 0.11
__version__ = 0.12

0 comments on commit 7247bb1

Please sign in to comment.