Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FACTS method bug: code in valid_ifthens function has hard-coded feature names #531

Open
phantom-duck opened this issue May 20, 2024 · 0 comments · Fixed by #533
Open

FACTS method bug: code in valid_ifthens function has hard-coded feature names #531

phantom-duck opened this issue May 20, 2024 · 0 comments · Fixed by #533

Comments

@phantom-duck
Copy link
Contributor

phantom-duck commented May 20, 2024

In the internal valid_ifthens function, there are 2 points where there exist hard-coded feature names in the code, and these parts of the code fail without them. Specifically:

  1. here, there is a reference to an age column of X, which is also assumed to be of pd.Interval dtype. As such, if this column does not exist or is not of Interval dtype, this part of the code throws an error.

Example to reproduce:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

from aif360.sklearn.datasets.openml_datasets import fetch_german
from aif360.sklearn.detectors.facts import FACTS

X, y = fetch_german()
assert (X.index == y.index).all()
X.reset_index(drop=True, inplace=True)
y = y.reset_index(drop=True).map({"bad": 0, "good": 1})

# split into train-test data
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, stratify=y)

categorical_features = X.select_dtypes(include=["object", "category"]).columns.to_list()
categorical_features_onehot_transformer = ColumnTransformer(
    transformers=[
        ("one-hot-encoder", OneHotEncoder(), categorical_features)
    ],
    remainder="passthrough"
)
model = Pipeline([
    ("one-hot-encoder", categorical_features_onehot_transformer),
    ("clf", LogisticRegression(max_iter=1500))
])

#### train the model
model = model.fit(X_train, y_train)

detector = FACTS(
    clf=model,
    prot_attr="sex",
    feature_weights={f: 1 for f in X.columns},
    feats_not_allowed_to_change=[]
)

detector = detector.fit(X_test)

The last command fails with AttributeError: 'numpy.float64' object has no attribute 'left'

  1. At this point, the recIsValid function is used. This, in turn, here also references hard-coded feature names. Here there exist checks of whether they exist or not, so the code does not fail if they do not exist. But there are cases where if a feature exists, it is assumed either to be of a certain type or to possess certain semantics.

I do not currently have a reproducible example for this one, because whether it will appear or not depends on the exact test data. I believe, however, that it is clear this is also a bug, and if we want to enforce some constraints, such as this part of the code is trying to do, it should be done in some other, more robust way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants