-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simpler duplicates #171
Simpler duplicates #171
Conversation
Codecov Report
@@ Coverage Diff @@
## master #171 +/- ##
========================================
- Coverage 81.21% 81% -0.22%
========================================
Files 24 24
Lines 1624 1606 -18
Branches 279 279
========================================
- Hits 1319 1301 -18
+ Misses 252 251 -1
- Partials 53 54 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. It will be useful.
@@ -126,8 +101,7 @@ def generate_quality_estimation( | |||
else: | |||
quality_estimation = ( | |||
adherence_to_schema_percent * 40 / 100 | |||
+ duplicated_items_percent * 10 / 100 | |||
+ duplicated_skus_percent * 5 / 100 | |||
+ duplicated_items_percent * 15 / 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From where these percentages come from? Why 15 percent of duplicated_items_percent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were set 1.5 years ago based on the experience of qa team at that moment.
I set 15 because it's the sum for duplicates, to keep the compatibility for now (see #154)
Closes #131, #117
uniques
arg to the main classRendered notebook
find_by_
into two methods:report_all
find_by
now accepts list of columns and any combinations of columns. Combination means we check the equality by all the values in the given combination together, e.g.arche.rules.duplicates.find_by(df, [["url", "name"], "upc"]).show()
will check thatups
is unique, and all rows have uniqueurl
andname
combinationMain
find_by
message changed fromto
uniques
is added toreport_all
. If schema contains any tag used in find_by_tag, it overwritesuniques
a.report_all(uniques=[["url", "name"], "upc"])
@peonone