-
Notifications
You must be signed in to change notification settings - Fork 72
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add DC verifier python bindings and basic example
- Loading branch information
Showing
13 changed files
with
122 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
import desbordante as db | ||
import pandas as pd | ||
|
||
# TODO: Add console interaction with user | ||
|
||
# The given denial constraint tells us that if two persons live in the | ||
# same state, the one earning a lower salary has a lower tax rate | ||
DC = "!(s.State == t.State and s.Salary < t.Salary and s.FedTaxRate > t.FedTaxRate)" | ||
|
||
TABLE_1 = "examples/datasets/taxes.csv" | ||
TABLE_2 = "examples/datasets/taxes_2.csv" | ||
|
||
print("""This is an example of how to use Denial Constraint (DC) verification for checking hypotheses on data. | ||
DC verification is perfomed by the Rapidash algorithm: https://arxiv.org/abs/2309.12436 | ||
DC φ is a conjunction of predicates of the following form: | ||
∀s, t ∈ R, s ≠ t: ¬(p_1 ∧ . . . ∧ p_m) | ||
DCs involve comparisons between pairs of rows within a dataset. | ||
A typical DC example, derived from a Functional Dependency such as A -> B, | ||
is expressed as: "∀s, t ∈ R, s ≠ t, ¬(t.A == s.A and t.B ≠ s.B). | ||
This denotes that for any pair of rows in the relation, it should not be the case | ||
that while the values in column A are equal, the values in column B are unequal. | ||
""") | ||
|
||
print("Consider the following dataset:\n") | ||
data = pd.read_csv(TABLE_1, header=[0]) | ||
print(data, end="\n\n") | ||
print("""And the following Denial Constraint: !(s.State == t.State and s.Salary < t.Salary and s.FedTaxRate > t.FedTaxRate). | ||
It tells us that for all people in the same state the person with a higher salary has a higher tax rate. | ||
Then we run the algorithm in order to see if the constraint holds. | ||
""") | ||
|
||
# Creating a verifier and loading data | ||
verifier = db.dc_verification.algorithms.Default() | ||
verifier.load_data(table=(TABLE_1, ',', True)) | ||
|
||
# Algorithm execution | ||
verifier.execute(denial_constraint=DC) | ||
|
||
# Obtaining the result | ||
result: bool = verifier.dc_holds() | ||
|
||
print(f"DC {DC} holds: {result}\n") | ||
|
||
print("""Now we will modify the initial dataset to make things a little bit more interesting. | ||
Consider the previous table but with an additional record for Texas: | ||
""") | ||
|
||
data = pd.read_csv(TABLE_2, header=[0]) | ||
print(data, end="\n\n") | ||
|
||
verifier = db.dc_verification.algorithms.Default() | ||
verifier.load_data(table=(TABLE_2, ',', True)) | ||
verifier.execute(denial_constraint=DC) | ||
|
||
result: bool = verifier.dc_holds() | ||
|
||
print(f"DC {DC} holds: {result}\n") | ||
|
||
print("""Now we can see that the same DC we examined on the previous dataset doesn't hold on the new one. | ||
The thing is that for the last record (Texas, 5000, 0.05) there are people in Texas with a lower salary | ||
but higher tax rate. Pairs of records like this that contradict a DC are called violations. | ||
In this case the following pairs are violations: | ||
(6, 9), (7, 9), (8, 9), where each number is an index of a record. | ||
""") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
State,Salary,FedTaxRate | ||
NewYork,3000,0.2 | ||
NewYork,4000,0.25 | ||
NewYork,5000,0.3 | ||
Wisconsin,5000,0.15 | ||
Wisconsin,6000,0.2 | ||
Wisconsin,4000,0.1 | ||
Texas,1000,0.15 | ||
Texas,2000,0.25 | ||
Texas,3000,0.3 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
State,Salary,FedTaxRate | ||
NewYork,3000,0.2 | ||
NewYork,4000,0.25 | ||
NewYork,5000,0.3 | ||
Wisconsin,5000,0.15 | ||
Wisconsin,6000,0.2 | ||
Wisconsin,4000,0.1 | ||
Texas,1000,0.15 | ||
Texas,2000,0.25 | ||
Texas,3000,0.3 | ||
Texas,5000,0.05 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
#include "dc/bind_dc_verification.h" | ||
|
||
#include "algorithms/dc/verifier/dc_verifier.h" | ||
#include "py_util/bind_primitive.h" | ||
|
||
namespace python_bindings { | ||
|
||
namespace py = pybind11; | ||
|
||
void BindDCVerification(py::module_& main_module) { | ||
auto dc_verification_module = main_module.def_submodule("dc_verification"); | ||
|
||
BindPrimitiveNoBase<algos::DCVerifier>(dc_verification_module, "DCVerification") | ||
.def("dc_holds", &algos::DCVerifier::DCHolds); | ||
} | ||
|
||
} // namespace python_bindings |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#pragma once | ||
|
||
#include <pybind11/pybind11.h> | ||
|
||
namespace python_bindings { | ||
void BindDCVerification(pybind11::module_& main_module); | ||
} // namespace python_bindings |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,4 +8,4 @@ Wisconsin,4000,0.1 | |
Texas,1000,0.15 | ||
Texas,2000,0.25 | ||
Texas,3000,0.3 | ||
Texas,3000,0.31 | ||
Texas,3000,0.31 |