BRIOxAlkemy is the outcome of the partnership between Alkemy and BRIO. The Alkemy's team responsible for this project is the Deep Learning and Big Data Department (DL&BD), an internal innovation team of the Alkemy group.
The tool is thought to be used for inspecting machine learning systems that could be affected by biases in their predictions.
A typical scenario where the tool can help is the following: consider a database containing details of individuals, with their age, gender, and level of education. Consider an algorithm which tries to predict whether each of them is likely to default on credit. The user wishes to check if age is a sensitive factor in such prediction. The user feeds the tool our dataset, the output of the run of the predictive algorithm, and mark the feature of age as sensitive. Currently the tool allows the user to compare either how the behaviour of the algorithm with respect to age differs from an ``optimal'' behaviour (in this case, the user might consider optimal the case where each age group equally succeeds), or how different age groups perform with respect to one another.
These two analyses take the names of FreqVsRef and FreqVsFreq, described in a section below.
The tool can be used through a web browser interacting with the provided frontend. It comes with a Makefile that allows to easily build and run the Docker image that encapsulate the code.
Provided that you have Docker up and running, to build the application (needed only the first time the tool is used) run:
make mongodb
make build
To run the application:
make frontend
Using your preferred web browser, navigate tolocalhost
in order to access the tool frontend.
To stop the application, run:
make stop
To stop the database, run:
make mongodb_stop
The main functionalities of the tools are also available as python library, named brio
. You can install it via pip, doing pip install brio
. The bias detection analyses can be performed directly using the FreqVsRefBiasDetector
and FreqVsFreqBiasDetector
classes' interfaces.
This analysis performs a comparison between the behaviour of the AI system and a target desirable behaviour, expressed as probability distribution.
This analysis is implemented in the FreqVsRefBiasDetector
class, in the bias
sub-module.
The method compare_root_variable_groups
computes the distance, in terms of normalized KL divergence, of the observed distribution of target_variable
conditioned to root_variable
(the sensitive feature) with respect to a reference distribution (the "Ref" of the analysis name) that is passed as parameter. The distance is compared with a threshold that represents a "safety" limit: distances bigger than the thresolds are considered signs of potential biases. The output is a tuple of three elements:
- a list with the distances of each category frequency of the sensitive variable from the expected reference
- a list of booleans, with the results of the comparison of the previous point's list with the provided threshold (distances < thresold)
- the thresold (provided as input or computed by the tool).
The method compare_root_variable_conditioned_groups
performs the same calculation described above but for each sub-group implied by the Cartesian product of the categories of conditioning_variables
, a list of available features present in dataframe and selected by the user. The computation is performed only if the sub-groups are composed of at least min_obs_per_group
observations, with a default of 30. The output is, for each sub-group, a tuple of four elements:
- number of observations of the group
- a list with the distances of each category frequency of the sensitive variable from the expected reference
- a list of booleans, with the results of the comparison of the previous point's list with the provided threshold (distances < thresold)
- the thresold (provided as input or computed by the tool).
This analysis performs a comparison between the behaviour of the AI system with respect to a sensitive class and the behaviour of the AI system with respect to another sensitive class related to the same sensitive feature. In case of multi-class sensitive features, the results of the comparisons are aggregated using an aggregation function selected by the user.
This analysis is implemented in the FreqVsFreqBiasDetector
class, in the bias
sub-module.
The method compute_distance_between_frequencies
computes the JS divergence or the TV distance as selected for the observed_distribution
, an array with the distribution of the target_variable
conditioned to root_variable
(the sensitive feature). The final value is provided using the selected aggregation function, relevant in case of multi-classes root variables.
The method compare_root_variable_groups
computes the mutual distance, in terms of JS or TVD, of the categories of the observed distribution of target_variable
conditioned to root_variable
.
The method compare_root_variable_conditioned_groups
performs the same calculation described above but for each sub-group implied by the Cartesian product of the categories of conditioning_variables
, with the same constraints as for the FreqsVsRefBiasDetector
class.
For both methods the output is a tuple similar to the ones described for the FreqsVsRef method, but with an additional element given by the standard deviation of the distances, provided only when the root variable is a multi-class feature.
With the latest version (1.2) we introduce a Risk measurement, based on the results of the Bias Detector. Currently it's only accessible via the python API, but we plan to add a frontend option for it. It is implemented in the brio/risk
sub-module.
The methodology behind the risk computation will be soon published with a scientific paper. If you want to experiment with it already, you can use the method compute_hazard_from_freqvsfreq_or_freqvsref
of the class HazardFromBiasDetectionCalculator
upon each results from FreqVsFreq and FreqVsRef. The computed hazards need to be passed to compute_risk
from RiskCalculator
: this function will provide an overall measure of risk.
Currently (June 2024) we plan to implement functionalities for the Opacity section, which is now empty, and a more refined and accessible Risk section.
We hope to raise interest in the data science community and ask for support! Anyone interested in expanding and improving our tool is more than welcome! You can do that opening a pull request for a functionality you wish to include. Also bugs warnings are very important and welcome.
Another way to cooperate with us is getting in touch with our team: just send an e-mail to [email protected] with your ideas, proposals or suggestions.