Fairness¶

Package that provides interfaces and built-in implementations for evaluating the fairness of models and datasets.

Metrics¶

Evaluating a Model¶

Statistical Parity¶

class guardian_ai.fairness.metrics.model.ModelStatisticalParityScorer(protected_attributes, distance_measure='diff', reduction='mean')[source]¶

Measure the statistical parity [1] of a model’s output between subgroups and the rest of the population.

Statistical parity (also known as Base Rate or Disparate Impact) states that a predictor is unbiased if the prediction is independent of the protected attribute.

Statistical Parity is calculated as PP / N, where PP and N are the number of Positive Predictions and total Number of predictions made, respectively.

Perfect score

A perfect score for this metric means that the model does not predict positively any of the subgroups at a different rate than it does for the rest of the population. For example, if the protected attributes are race and sex, then a perfect statistical parity would mean that all combinations of values for race and sex have identical ratios of positive predictions. Perfect values are:

1 if using 'ratio' as distance_measure.
0 if using 'diff' as distance_measure.

Parameters:

protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

References

[1] Cynthia Dwork et al. “Fairness Through Awareness”. Innovations in Theoretical Computer Science. 2012.

Examples

from guardian_ai.fairness.metrics import ModelStatisticalParityScorer

scorer = ModelStatisticalParityScorer(['race', 'sex'])
scorer(model, X, y_true)

This metric does not require y_true. It can also be called using

scorer(model, X)

__call__(model, X, y_true=None, supplementary_features=None)[source]¶

Compute the metric using a model’s predictions on a given array of instances X.

Parameters:

model (Any) – Object that implements a predict(X) function to collect categorical predictions.
X (pandas.DataFrame) – Array of instances to compute the metric on.
y_true (pandas.Series, numpy.ndarray, list, or None, default=None) – Array of groundtruth labels.
supplementary_features (pandas.DataFrame, or None, default=None) – Array of supplementary features for each instance. Used in case one attribute in self.protected_attributes is not contained by X (e.g. if the protected attribute is not used by the model).

Returns:

The computed metric value, with format according to self.reduction.

Return type:

float, dict

Raises:

GuardianAIValueError –

if a feature is present in both X and supplementary_features.

guardian_ai.fairness.metrics.model.model_statistical_parity(y_true=None, y_pred=None, subgroups=None, distance_measure='diff', reduction='mean')[source]¶

Measure the statistical parity of a model’s output between subgroups and the rest of the population.

For more details, refer to ModelStatisticalParityScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list or None, default=None) – Array of groundtruth labels.
y_pred (pandas.Series, numpy.ndarray, list or None, default=None) – Array of model predictions.
subgroups (pandas.DataFrame or None, default=None) – Dataframe containing protected attributes for each instance.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Returns:

The computed metric value, with format according to reduction.

Return type:

float, dict

Raises:

GuardianAIValueError – If Value of None is received for either y_pred or subgroups.

Examples

from guardian_ai.fairness.metrics import model_statistical_parity
subgroups = X[['race', 'sex']]
model_statistical_parity(y_true, y_pred, subgroups)

This metric does not require y_true. It can also be called using

model_statistical_parity(None, y_pred, subgroups)
model_statistical_parity(y_pred=y_pred, subgroups=subgroups)

True Positive Rate Disparity¶

class guardian_ai.fairness.metrics.model.TruePositiveRateScorer(protected_attributes, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s true positive rate between subgroups and the rest of the population (also known as equal opportunity).

For each subgroup, the disparity is measured by comparing the true positive rate on instances of a subgroup against the rest of the population.

True Positive Rate [1] (also known as TPR, recall, or sensitivity) is calculated as TP / (TP + FN), where TP and FN are the number of true positives and false negatives, respectively.

Perfect score

A perfect score for this metric means that the model does not correctly predict the positive class for any of the subgroups more often than it does for the rest of the population. For example, if the protected attributes are race and sex, then a perfect true positive rate disparity would mean that all combinations of values for race and sex have identical true positive rates. Perfect values are:

1 if using 'ratio' as distance_measure.
0 if using 'diff' as distance_measure.

Parameters:

protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

References

[1] Moritz Hardt et al. “Equality of Opportunity in Supervised Learning”. Advances in Neural Information Processing Systems. 2016.

Examples

from guardian_ai.fairness.metrics import TruePositiveRateScorer
scorer = TruePositiveRateScorer(['race', 'sex'])
scorer(model, X, y_true)

__call__(model, X, y_true, supplementary_features=None)¶

Compute the metric using a model’s predictions on a given array of instances X.

Parameters:

model (Any) – Object that implements a predict(X) function to collect categorical predictions.
X (pandas.DataFrame) – Array of instances to compute the metric on.
y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
supplementary_features (pandas.DataFrame or None, default=None) – Array of supplementary features for each instance. Used in case one attribute in self.protected_attributes is not contained by X (e.g. if the protected attribute is not used by the model).

Returns:

The computed metric value, with format according to self.reduction.

Return type:

float, dict

Raises:

GuardianAIValueError –

if a feature is present in both X and supplementary_features.

guardian_ai.fairness.metrics.model.true_positive_rate(y_true, y_pred, subgroups, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s true positive rate between subgroups and the rest of the population.

For more details, refer to TruePositiveRateScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
y_pred (pandas.Series, numpy.ndarray, list) – Array of model predictions.
subgroups (pandas.DataFrame) – Dataframe containing protected attributes for each instance.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Returns:

The computed metric value, with format according to reduction.

Return type:

float, dict

Examples

from guardian_ai.fairness.metrics import true_positive_rate
subgroups = X[['race', 'sex']]
true_positive_rate(y_true, y_pred, subgroups)

False Positive Rate Disparity¶

class guardian_ai.fairness.metrics.model.FalsePositiveRateScorer(protected_attributes, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s false positive rate between subgroups and the rest of the population.

For each subgroup, the disparity is measured by comparing the false positive rate on instances of a subgroup against the rest of the population.

False Positive Rate [1] (also known as FPR or fall-out) is calculated as FP / (FP + TN), where FP and TN are the number of false positives and true negatives, respectively.

Perfect score

A perfect score for this metric means that the model does not incorrectly predict the positive class for any of the subgroups more often than it does for the rest of the population. For example, if the protected attributes are race and sex, then a perfect false positive rate disparity would mean that all combinations of values for race and sex have identical false positive rates. Perfect values are:

1 if using 'ratio' as distance_measure.
0 if using 'diff' as distance_measure.

Parameters:

protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

References

[1] Alexandra Chouldechova. “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments”. Big Data (2016).

Examples

from guardian_ai.fairness.metrics import FalsePositiveRateScorer
scorer = FalsePositiveRateScorer(['race', 'sex'])
scorer(model, X, y_true)

__call__(model, X, y_true, supplementary_features=None)¶

Compute the metric using a model’s predictions on a given array of instances X.

Parameters:

model (Any) – Object that implements a predict(X) function to collect categorical predictions.
X (pandas.DataFrame) – Array of instances to compute the metric on.
y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
supplementary_features (pandas.DataFrame or None, default=None) – Array of supplementary features for each instance. Used in case one attribute in self.protected_attributes is not contained by X (e.g. if the protected attribute is not used by the model).

Returns:

The computed metric value, with format according to self.reduction.

Return type:

float, dict

Raises:

GuardianAIValueError –

if a feature is present in both X and supplementary_features.

guardian_ai.fairness.metrics.model.false_positive_rate(y_true, y_pred, subgroups, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s false positive rate between subgroups and the rest of the population.

For more details, refer to FalsePositiveRateScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
y_pred (pandas.Series, numpy.ndarray, list) – Array of model predictions.
subgroups (pandas.DataFrame) – Dataframe containing protected attributes for each instance.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Returns:

The computed metric value, with format according to reduction.

Return type:

float, dict

Examples

from guardian_ai.fairness.metrics import false_positive_rate
subgroups = X[['race', 'sex']]
false_positive_rate(y_true, y_pred, subgroups)

False Negative Rate Disparity¶

class guardian_ai.fairness.metrics.model.FalseNegativeRateScorer(protected_attributes, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s false negative rate between subgroups and the rest of the population.

For each subgroup, the disparity is measured by comparing the false negative rate on instances of a subgroup against the rest of the population.

False Negative Rate [1] (also known as FNR or miss rate) is calculated as FN / (FN + TP), where FN and TP are the number of false negatives and true positives, respectively.

Perfect score

A perfect score for this metric means that the model does not incorrectly predict the negative class for any of the subgroups more often than it does for the rest of the population. For example, if the protected attributes are race and sex, then a perfect false negative rate disparity would mean that all combinations of values for race and sex have identical false negative rates. Perfect values are:

1 if using 'ratio' as distance_measure.
0 if using 'diff' as distance_measure.

Parameters:

protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

References

[1] Alexandra Chouldechova. “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments”. Big Data (2016).

Examples

from guardian_ai.fairness.metrics import FalseNegativeRateScorer
scorer = FalseNegativeRateScorer(['race', 'sex'])
scorer(model, X, y_true)

__call__(model, X, y_true, supplementary_features=None)¶

Compute the metric using a model’s predictions on a given array of instances X.

Parameters:

model (Any) – Object that implements a predict(X) function to collect categorical predictions.
X (pandas.DataFrame) – Array of instances to compute the metric on.
y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
supplementary_features (pandas.DataFrame or None, default=None) – Array of supplementary features for each instance. Used in case one attribute in self.protected_attributes is not contained by X (e.g. if the protected attribute is not used by the model).

Returns:

The computed metric value, with format according to self.reduction.

Return type:

float, dict

Raises:

GuardianAIValueError –

if a feature is present in both X and supplementary_features.

guardian_ai.fairness.metrics.model.false_negative_rate(y_true, y_pred, subgroups, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s false negative rate between subgroups and the rest of the population.

For more details, refer to FalseNegativeRateScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
y_pred (pandas.Series, numpy.ndarray, list) – Array of model predictions.
subgroups (pandas.DataFrame) – Dataframe containing protected attributes for each instance.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Returns:

The computed metric value, with format according to reduction.

Return type:

float, dict

Examples

from guardian_ai.fairness.metrics import false_negative_rate
subgroups = X[['race', 'sex']]
false_negative_rate(y_true, y_pred, subgroups)

False Omission Rate Disparity¶

class guardian_ai.fairness.metrics.model.FalseOmissionRateScorer(protected_attributes, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s false omission rate between subgroups and the rest of the population.

For each subgroup, the disparity is measured by comparing the false omission rate on instances of a subgroup against the rest of the population.

False Omission Rate (also known as FOR) is calculated as FN / (FN + TN), where FN and TN are the number of false negatives and true negatives, respectively.

Perfect score

A perfect score for this metric means that the model does not make more mistakes on the negative class for any of the subgroups more often than it does for the rest of the population. For example, if the protected attributes are race and sex, then a perfect false omission rate disparity would mean that all combinations of values for race and sex have identical false omission rates. Perfect values are:

1 if using 'ratio' as distance_measure.
0 if using 'diff' as distance_measure.

Parameters:

protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Examples

from guardian_ai.fairness.metrics import FalseOmissionRateScorer
scorer = FalseOmissionRateScorer(['race', 'sex'])
scorer(model, X, y_true)

__call__(model, X, y_true, supplementary_features=None)¶

Compute the metric using a model’s predictions on a given array of instances X.

Parameters:

model (Any) – Object that implements a predict(X) function to collect categorical predictions.
X (pandas.DataFrame) – Array of instances to compute the metric on.
y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
supplementary_features (pandas.DataFrame or None, default=None) – Array of supplementary features for each instance. Used in case one attribute in self.protected_attributes is not contained by X (e.g. if the protected attribute is not used by the model).

Returns:

The computed metric value, with format according to self.reduction.

Return type:

float, dict

Raises:

GuardianAIValueError –

if a feature is present in both X and supplementary_features.

guardian_ai.fairness.metrics.model.false_omission_rate(y_true, y_pred, subgroups, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s false omission rate between subgroups and the rest of the population.

For more details, refer to FalseOmissionRateScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
y_pred (pandas.Series, numpy.ndarray, list) – Array of model predictions.
subgroups (pandas.DataFrame) – Dataframe containing protected attributes for each instance.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Returns:

The computed metric value, with format according to reduction.

Return type:

float, dict

Examples

from guardian_ai.fairness.metrics import false_omission_rate
subgroups = X[['race', 'sex']]
false_omission_rate(y_true, y_pred, subgroups)

False Discovery Rate Disparity¶

class guardian_ai.fairness.metrics.model.FalseDiscoveryRateScorer(protected_attributes, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s false discovery rate between subgroups and the rest of the population.

For each subgroup, the disparity is measured by comparing the false discovery rate on instances of a subgroup against the rest of the population.

False Discovery Rate (also known as FDR) is calculated as FP / (FP + TP), where FP and TP are the number of false positives and true positives, respectively.

Perfect score

A perfect score for this metric means that the model does not make more mistakes on the positive class for any of the subgroups more often than it does for the rest of the population. For example, if the protected attributes are race and sex, then a perfect false discovery rate disparity would mean that all combinations of values for race and sex have identical false discovery rates. Perfect values are:

1 if using 'ratio' as distance_measure.
0 if using 'diff' as distance_measure.

Parameters:

protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Examples

from guardian_ai.fairness.metrics import FalseDiscoveryRateScorer
scorer = FalseDiscoveryRateScorer(['race', 'sex'])
scorer(model, X, y_true)

__call__(model, X, y_true, supplementary_features=None)¶

Compute the metric using a model’s predictions on a given array of instances X.

Parameters:

model (Any) – Object that implements a predict(X) function to collect categorical predictions.
X (pandas.DataFrame) – Array of instances to compute the metric on.
y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
supplementary_features (pandas.DataFrame or None, default=None) – Array of supplementary features for each instance. Used in case one attribute in self.protected_attributes is not contained by X (e.g. if the protected attribute is not used by the model).

Returns:

The computed metric value, with format according to self.reduction.

Return type:

float, dict

Raises:

GuardianAIValueError –

if a feature is present in both X and supplementary_features.

guardian_ai.fairness.metrics.model.false_discovery_rate(y_true, y_pred, subgroups, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s false discovery rate between subgroups and the rest of the population.

For more details, refer to FalseDiscoveryRateScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
y_pred (pandas.Series, numpy.ndarray, list) – Array of model predictions.
subgroups (pandas.DataFrame) – Dataframe containing protected attributes for each instance.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Returns:

The computed metric value, with format according to reduction.

Return type:

float, dict

Examples

from guardian_ai.fairness.metrics import false_discovery_rate
subgroups = X[['race', 'sex']]
false_discovery_rate(y_true, y_pred, subgroups)

Error Rate Disparity¶

class guardian_ai.fairness.metrics.model.ErrorRateScorer(protected_attributes, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s error rate between subgroups and the rest of the population.

For each subgroup, the disparity is measured by comparing the error rate on instances of a subgroup against the rest of the population.

Error Rate (also known as inaccuracy) is calculated as (FP + FN) / N, where FP and FN are the number of false positives and false negatives, respectively, while N is the total Number of instances.

Perfect score

A perfect score for this metric means that the model does not make more mistakes for any of the subgroups more often than it does for the rest of the population. For example, if the protected attributes are race and sex, then a perfect error rate disparity would mean that all combinations of values for race and sex have identical error rates. Perfect values are:

1 if using 'ratio' as distance_measure.
0 if using 'diff' as distance_measure.

Parameters:

protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Examples

from guardian_ai.fairness.metrics import ErrorRateScorer
scorer = ErrorRateScorer(['race', 'sex'])
scorer(model, X, y_true)

__call__(model, X, y_true, supplementary_features=None)¶

Compute the metric using a model’s predictions on a given array of instances X.

Parameters:

model (Any) – Object that implements a predict(X) function to collect categorical predictions.
X (pandas.DataFrame) – Array of instances to compute the metric on.
y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
supplementary_features (pandas.DataFrame or None, default=None) – Array of supplementary features for each instance. Used in case one attribute in self.protected_attributes is not contained by X (e.g. if the protected attribute is not used by the model).

Returns:

The computed metric value, with format according to self.reduction.

Return type:

float, dict

Raises:

GuardianAIValueError –

if a feature is present in both X and supplementary_features.

guardian_ai.fairness.metrics.model.error_rate(y_true, y_pred, subgroups, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s error rate between subgroups and the rest of the population.

For more details, refer to ErrorRateScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
y_pred (pandas.Series, numpy.ndarray, list) – Array of model predictions.
subgroups (pandas.DataFrame) – Dataframe containing protected attributes for each instance.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Returns:

The computed metric value, with format according to reduction.

Return type:

float, dict

Examples

from guardian_ai.fairness.metrics import error_rate
subgroups = X[['race', 'sex']]
error_rate(y_true, y_pred, subgroups)

Equalized Odds¶

class guardian_ai.fairness.metrics.model.EqualizedOddsScorer(protected_attributes, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s true positive and false positive rates between subgroups and the rest of the population.

The disparity is measured by comparing the true positive and false positive rates on instances of a subgroup against the rest of the population.

True Positive Rate (also known as TPR, recall, or sensitivity) is calculated as TP / (TP + FN), where TP and FN are the number of true positives and false negatives, respectively.

False Positive Rate (also known as FPR or fall-out) is calculated as FP / (FP + TN), where FP and TN are the number of false positives and true negatives, respectively.

Equalized Odds [1] is computed by taking the maximum distance between TPR and FPR for a subgroup against the rest of the population.

Perfect score

A perfect score for this metric means that the model has the same TPR and FPR when comparing a subgroup to the rest of the population. For example, if the protected attributes are race and sex, then a perfect Equalized Odds disparity would mean that all combinations of values for race and sex have identical TPR and FPR. Perfect values are:

1 if using 'ratio' as distance_measure.
0 if using 'diff' as distance_measure.

Parameters:

protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

References

[1] Moritz Hardt et al. “Equality of Opportunity in Supervised Learning”. Advances in Neural Information Processing Systems. 2016.

Examples

from guardian_ai.fairness.metrics import EqualizedOddsScorer
scorer = EqualizedOddsScorer(['race', 'sex'])
scorer(model, X, y_true)

__call__(model, X, y_true, supplementary_features=None)¶

Compute the metric using a model’s predictions on a given array of instances X.

Parameters:

model (Any) – Object that implements a predict(X) function to collect categorical predictions.
X (pandas.DataFrame) – Array of instances to compute the metric on.
y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
supplementary_features (pandas.DataFrame or None, default=None) – Array of supplementary features for each instance. Used in case one attribute in self.protected_attributes is not contained by X (e.g. if the protected attribute is not used by the model).

Returns:

The computed metric value, with format according to self.reduction.

Return type:

float, dict

Raises:

GuardianAIValueError –

if a feature is present in both X and supplementary_features.

guardian_ai.fairness.metrics.model.equalized_odds(y_true, y_pred, subgroups, distance_measure='diff', reduction='mean')[source]¶

Measures the disparity of a model’s true positive and false positive rates between subgroups and the rest of the population.

For more details, refer to EqualizedOddsScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
y_pred (pandas.Series, numpy.ndarray, list) – Array of model predictions.
subgroups (pandas.DataFrame) – Dataframe containing protected attributes for each instance.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Returns:

The computed metric value, with format according to reduction.

Return type:

float, dict

Examples

from guardian_ai.fairness.metrics import equalized_odds
subgroups = X[['race', 'sex']]
equalized_odds(y_true, y_pred, subgroups)

Theil Index¶

class guardian_ai.fairness.metrics.model.TheilIndexScorer(protected_attributes, distance_measure=None, reduction='mean')[source]¶

Measures the disparity of a model’s predictions according to groundtruth labels, as proposed by Speicher et al. [1].

Intuitively, the Theil Index can be thought of as a measure of the divergence between a subgroup’s different error distributions (i.e. false positives and false negatives) against the rest of the population.

Perfect score: The perfect score for this metric is 0, meaning that the model does not have a different error distribution for any subgroup when compared to the rest of the population. For example, if the protected attributes are race and sex, then a perfect Theil Index disparity would mean that all combinations of values for race and sex have identical error distributions.

Parameters:

protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
distance_measure (str or None, default=None) –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

References

[1]: `Speicher, Till, et al. “A unified approach to quantifying algorithmic: unfairness: Measuring individual & group unfairness via inequality indices.” Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018. <https://arxiv.org/abs/1807.00787>`_

Examples

from guardian_ai.fairness.metrics import TheilIndexScorer
scorer = TheilIndexScorer(['race', 'sex'])
scorer(model, X, y_true)

__call__(model, X, y_true, supplementary_features=None)¶

Compute the metric using a model’s predictions on a given array of instances X.

Parameters:

model (Any) – Object that implements a predict(X) function to collect categorical predictions.
X (pandas.DataFrame) – Array of instances to compute the metric on.
y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
supplementary_features (pandas.DataFrame or None, default=None) – Array of supplementary features for each instance. Used in case one attribute in self.protected_attributes is not contained by X (e.g. if the protected attribute is not used by the model).

Returns:

The computed metric value, with format according to self.reduction.

Return type:

float, dict

Raises:

GuardianAIValueError –

if a feature is present in both X and supplementary_features.

guardian_ai.fairness.metrics.model.theil_index(y_true, y_pred, subgroups, distance_measure=None, reduction='mean')[source]¶

Measures the disparity of a model’s predictions according to groundtruth labels, as proposed by Speicher et al. [1].

For more details, refer to TheilIndexScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels.
y_pred (pandas.Series, numpy.ndarray, list) – Array of model predictions.
subgroups (pandas.DataFrame) – Dataframe containing protected attributes for each instance.
distance_measure (str or None, default=None) –
Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
- 'ratio': Uses (subgroup_val / rest_of_pop_val).
Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup_val - rest_of_pop_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup: subgroup_metric, ...} dict.

Returns:

The computed metric value, with format according to reduction.

Return type:

float, dict

Raises:

AutoMLxValueError – If distance_measure values are given to Theil Index.

References

[1]: `Speicher, Till, et al. “A unified approach to quantifying algorithmic: unfairness: Measuring individual & group unfairness via inequality indices.” Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018. <https://arxiv.org/abs/1807.00787>`_

Examples

from guardian_ai.fairness.metrics import theil_index
subgroups = X[['race', 'sex']]
theil_index(y_true, y_pred, subgroups)

Evaluating a Dataset¶

Statistical Parity¶

class guardian_ai.fairness.metrics.dataset.DatasetStatisticalParityScorer(protected_attributes, distance_measure='diff', reduction='mean')[source]¶

Measures the statistical parity [1] of a dataset. Statistical parity (also known as Base Rate or Disparate Impact) for a dataset states that a dataset is unbiased if the label is independent of the protected attribute.

For each subgroup, statistical parity is computed as the ratio of positive labels in a subgroup.

Statistical Parity (also known as Base Rate or Disparate Impact) is calculated as PL / N, where PL and N are the number of Positive Labels and total number of instances, respectively.

Perfect score

A perfect score for this metric means that the dataset does not have a different ratio of positive labels for a subgroup than it does for the rest of the subgroups. For example, if the protected attributes are race and sex, then a perfect statistical parity would mean that all combinations of values for race and sex have identical ratios of positive labels. Perfect values are:

1 if using 'ratio' as distance_measure.
0 if using 'diff' as distance_measure.

Parameters:

protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the subgroups. Possible values are:
- 'ratio': Uses (subgroup1_val / subgroup2_val). Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup1_val - subgroup2_val |.
reduction (str or None, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup_pair: subgroup_pair_metric, ...} dict.

References

[1] Cynthia Dwork et al. “Fairness Through Awareness”. Innovations in Theoretical Computer Science. 2012.

Examples

from guardian_ai.fairness.metrics import DatasetStatisticalParityScorer
scorer = DatasetStatisticalParityScorer(['race', 'sex'])
scorer(X=X, y_true=y_true)
scorer(None, X, y_true)

__call__(model=None, X=None, y_true=None, supplementary_features=None)¶

Compute the metric on a given array of instances X.

Parameters:

model (object or None, default=None) – Object that implements a predict(X) function to collect categorical predictions.
X (pandas.DataFrame or None, default=None) – Array of instances to compute the metric on.
y_true (pandas.Series, numpy.ndarray, list or None, default=None) – Array of groundtruth labels.
supplementary_features (pandas.DataFrame, or None, default=None) – Array of supplementary features for each instance. Used in case one attribute in self.protected_attributes is not contained by X (e.g. if the protected attribute is not used by the model). Raise an GuardianAIValueError if a feature is present in both X and supplementary_features.

Returns:

The computed metric value, with format according to self.reduction.

Return type:

float, dict

Raises:

GuardianAIValueError – If a feature is present in both X and supplementary_features.

guardian_ai.fairness.metrics.dataset.dataset_statistical_parity(y_true, subgroups, distance_measure='diff', reduction='mean')[source]¶

Measures the statistical parity of a dataset.

For more details, refer to DatasetStatisticalParityScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels
subgroups (pandas.DataFrame) – Dataframe containing protected attributes for each instance.
distance_measure (str, default='diff') –
Determines the distance used to compare a subgroup’s metric against the rest of the subgroups. Possible values are:
- 'ratio': Uses (subgroup1_val / subgroup2_val). Inverted to always be >= 1 if needed.
- 'diff': Uses | subgroup1_val - subgroup2_val |.
reduction (str, default='mean') –
Determines how to reduce scores on all subgroups to a single output. Possible values are:
- 'max': Returns the maximal value among all subgroup metrics.
- 'mean': Returns the mean over all subgroup metrics.
- None: Returns a {subgroup_pair: subgroup_pair_metric, ...} dict.

Examples

from guardian_ai.fairness.metrics import dataset_statistical_parity
subgroups = X[['race', 'sex']]
dataset_statistical_parity(y_true, subgroups)

Consistency¶

class guardian_ai.fairness.metrics.dataset.ConsistencyScorer(protected_attributes)[source]¶

Measures the consistency of a dataset.

Consistency is measured as the number of ratio of instances that have a different label from the k=5 nearest neighbors.

Perfect score: A perfect score for this metric is 0, meaning that the dataset does not have different labels for instances that are similar to one another.

Parameters:: protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.

Examples

from guardian_ai.fairness.metrics import ConsistencyScorer
scorer = ConsistencyScorer(['race', 'sex'])
scorer(X=X, y_true=y_true)
scorer(None, X, y_true)

__call__(model=None, X=None, y_true=None, supplementary_features=None)¶

Call self as a function.

Parameters:

model (object | None)
X (DataFrame | None)
y_true (Series | ndarray | List | None)
supplementary_features (DataFrame | None)

guardian_ai.fairness.metrics.dataset.consistency(y_true, subgroups)[source]¶

Measures the consistency of a dataset.

For more details, refer to ConsistencyScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels
subgroups (pandas.DataFrame) – Dataframe containing protected attributes for each instance.

Examples

from guardian_ai.fairness.metrics import consistency
subgroups = X[['race', 'sex']]
consistency(y_true, subgroups)

Smoothed EDF¶

class guardian_ai.fairness.metrics.dataset.SmoothedEDFScorer(protected_attributes)[source]¶

Measures the smoothed Empirical Differential Fairness (EDF) of a dataset, as proposed by Foulds et al. [1].

Smoothed EDF returns the minimal exponential deviation of positive target ratios comparing a subgroup to the rest of the subgroups.

This metric is related to DatasetStatisticalParity with reduction=’max’ and distance_measure=’ratio’, with the only difference being that SmoothedEDFScorer returns a logarithmic value instead.

Perfect score: A perfect score for this metric is 0, meaning that the dataset does not have a different ratio of positive labels for a subgroup than it does for the rest of the subgroups. For example, if the protected attributes are race and sex, then a perfect smoothed EDF would mean that all combinations of values for race and sex have identical ratios of positive labels.

Parameters:: protected_attributes (pandas.Series, numpy.ndarray, list, str) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.

References

[1] Foulds, James R., et al. “An intersectional definition of fairness.” 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020.

Examples

from guardian_ai.fairness.metrics import SmoothedEDFScorer
scorer = SmoothedEDFScorer(['race', 'sex'])
scorer(X=X, y_true=y_true)
scorer(None, X, y_true)

__call__(model=None, X=None, y_true=None, supplementary_features=None)¶

Call self as a function.

Parameters:

model (object | None)
X (DataFrame | None)
y_true (Series | ndarray | List | None)
supplementary_features (DataFrame | None)

guardian_ai.fairness.metrics.dataset.smoothed_edf(y_true, subgroups)[source]¶

Measures the smoothed Empirical Differential Fairness (EDF) of a dataset, as proposed by Foulds et al. [1].

For more details, refer to SmoothedEDFScorer.

Parameters:

y_true (pandas.Series, numpy.ndarray, list) – Array of groundtruth labels
subgroups (pandas.DataFrame) – Dataframe containing protected attributes for each instance.

References

[1] Foulds, James R., et al. “An intersectional definition of fairness.” 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020.

Examples

from guardian_ai.fairness.metrics import smoothed_edf
subgroups = X[['race', 'sex']]
smoothed_edf(y_true, subgroups)

Fairness¶

Metrics¶

Evaluating a Model¶

Statistical Parity¶

True Positive Rate Disparity¶

False Positive Rate Disparity¶

False Negative Rate Disparity¶

False Omission Rate Disparity¶

False Discovery Rate Disparity¶

Error Rate Disparity¶

Equalized Odds¶

Theil Index¶

Evaluating a Dataset¶

Statistical Parity¶

Consistency¶

Smoothed EDF¶

Bias Mitigation¶

Bias Mitigator¶