Privacy Estimation#

Dataset#

class guardian_ai.privacy_estimation.dataset.Dataset(name=None, df_x=None, df_y=None)[source]#

Wrapper for the dataset that also maintains various data splits that are required for carrying out the attacks. Also implements utility methods for generating attack sets

Create the dataset wrapper.

Parameters:
  • name (str) – Name for this dataset.

  • df_x ({array-like, sparse matrix} of shape (n_samples, n_feature),) – where n_samples is the number of samples and n_features is the number of features.

  • df_y (darray of shape (n_samples,)) – Output labels.

create_attack_set_from_splits(attack_in_set_name, attack_out_set_name)[source]#

Given the splits that correspond to attack in and out sets, generate the full attack set.

Parameters:
  • attack_in_set_name – Dataset that was included as part of the training set of the target model.

  • attack_out_set_name – Dataset that was not included as part of the training set of the target model.

Returns:

Input features and output labels of the attack data points, along with their membership label (0-1 label that says whether or not they were included during training the target model)

Return type:

{array-like, sparse matrix} of shape (n_samples, n_feature), darray of shape (n_samples,), darray of shape (n_samples,)

get_merged_sets(split_names)[source]#

Merge multiple splits of data.

Parameters:

split_names (List[str]) – Names of splits to be merged.

Returns:

Merged datasets.

Return type:

{array-like, sparse matrix} of shape (n_samples, n_feature), darray of shape (n_samples,)

abstract load_data(source_file, header=None, target_ix=None, ignore_ix=None)[source]#

Method that specifies how the data should be loaded. Mainly applicable for tabular data

Parameters:
  • source_file (os.path) – Filename of the source file.

  • header (bool) – Whether to contain header.

  • target_ix (int) – Index of the target variable.

  • ignore_ix (List[int]) – Indices to be ignored.

Returns:

Input features and output labels.

Return type:

pandas dataframe of shape (n_samples, n_feature), pandas df of shape (n_samples,)

split_dataset(seed, split_array, split_names=None)[source]#

Splits dataset according to the specified fractions.

Parameters:
  • seed (int) – Random seed for creating the splits.

  • split_array (List[float]) – Array of fractions to split the data in. Must sum to 1.

  • split_names (List[str]) – Names assigned to the splits.

Returns:

dict of string to tuple of df_x and df_y of the splits Dictionary of splits, with keys as the split names and values as the splits

Return type:

dict

Model#

class guardian_ai.privacy_estimation.model.TargetModel[source]#

Wrapper for the target model that is being attacked. For now, we’re only supporting sklearn classifiers that implement .predict_proba

Create the target model that is being attacked, and check that it’s a classifier

get_f1(x_test, y_test)[source]#

Gets f1 score.

Parameters:
  • x_test ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features.

  • y_test (ndarray of shape (n_samples,)) –

abstract get_model()[source]#

Create the target model that is being attacked.

Return type:

Model that is not yet trained.

get_model_name()[source]#

Get default model name.

get_prediction_probs(X)[source]#

Gets model proba.

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features.

get_predictions(X)[source]#

Gets model prediction.

Parameters:
  • {array-like – where n_samples is the number of samples and n_features is the number of features.

  • (n_samples (sparse matrix} of shape) – where n_samples is the number of samples and n_features is the number of features.

  • n_features) – where n_samples is the number of samples and n_features is the number of features.

:param : where n_samples is the number of samples and n_features is the number of features.

load_model(filename)[source]#

Load model.

Parameters:

filename (FileDescriptorOrPath) –

save_model(filename)[source]#

Save model.

Parameters:

filename (FileDescriptorOrPath) –

test_model(x_test, y_test)[source]#

Test the model that is being attacked.

Parameters:
  • x_test ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables of the test set for the target model.

  • y_test (ndarray of shape (n_samples,)) – Output labels of the test set for the target model.

Return type:

None

train_model(x_train, y_train)[source]#

Train the model that is being attacked.

Parameters:
  • x_train ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables of the training set for the target model.

  • y_train (ndarray of shape (n_samples,)) – Output labels of the training set for the target model.

Return type:

Trained model

Attack#

class guardian_ai.privacy_estimation.attack.BlackBoxAttack(attack_model, name='generic_black_box_attack')[source]#

This is the base class for all black box attacks. It has a base estimator, which could be a threshold based, or learning based classifier - typically a binary classifier that decides whether an attack data point was part of the original training data for the target model or not. It’s black box because this type of attack can only access the prediction API of the target model and does not have access to the model parameters.

Initialize the attack.

Parameters:
  • attack_model (sklearn.base.BaseEstimator) –

  • name (str) – Name of this attack for reporting purposes.

evaluate_attack(target_model, X_attack_test, y_attack_test, y_membership_test, metric_functions, print_roc_curve=False, cache_input=False, use_cache=False)[source]#

Runs the attack against the target model, evaluates its accuracy and provides the metrics of interest on the success of the attack.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model being attacked.

  • X_attack_test ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the dataset on which to run the attack model. These are the original features (not attack/membership features).

  • y_attack_test (ndarray of shape (n_samples,)) – Output labels for the dataset on which to run the attack model. These are the original labels (not membership labels).

  • y_membership_test (ndarray of shape (n_samples,)) – Membership labels for the dataset on which we want to run the attack model. These are binary and indicate whether the data point was included in the training dataset of the target model, and helps us evaluate the attack model’s accuracy.

  • metric_functions (List[str]) – List of metric functions that we care about for evaluating the success of these attacks. Supports all sklearn.metrics that are relevant to binary classification, since the attack model is almost always a binary classifier.

  • print_roc_curve (bool, Defaults to False.) – Print out the values of the tpr and fpr. Only works for trained attack classifiers for now.

  • cache_input (bool, Defaults to False.) – Should we cache the input values - useful for expensive feature calculations like the merlin ratio.

  • use_cache (bool, Defaults to False.) – Should we use the feature values from the cache - useful for Morgan attack, which uses merlin ratio and loss values.

Returns:

Success metrics for the attack.

Return type:

List[float]

perform_attack(target_model, X_attack, y_attack)[source]#

Perform the actual attack. For now, this method would only be used in settings where the attacks themselves are being audited. Usually, we only call the evaluate_attack method.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model being attacked.

  • X_attack ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the attack points. These are the original features (not attack/membership features).

  • y_attack (ndarray of shape (n_samples,)) – Output labels for the attack points. These are the original labels (not membership labels).

Returns:

y_pred – Vector containing the Binary predictions on whether the attack points were part of the dataset used to train the target model.

Return type:

ndarray of shape (n_samples,)

train_attack_model(target_model, X_attack_train, y_attack_train, y_membership_train, threshold_grid=None, cache_input=False, use_cache=False)[source]#

Takes the attack data points, transforms them into attack features and then trains the attack model using membership labels for those points. If a threshold grid is provided, it will simply tune the threshold using that grid, otherwise, it will train the model.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model that is being attacked.

  • X_attack_train ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the dataset on which we want to train the attack model. These are the original features (not attack/membership features).

  • y_attack_train (ndarray of shape (n_samples,)) – Output labels for the dataset on which we want to train the attack model. These are the original labels (not membership labels).

  • y_membership_train (ndarray of shape (n_samples,)) – Membership labels for the dataset on which we want to train the attack model. These are binary and indicate whether the data point was included in the training dataset of the target model.

  • threshold_grid (List[float]) – Threshold grid to use for tuning this model.

  • cache_input (bool) – Should we cache the input values - useful for expensive feature calculations like the merlin ratio.

  • use_cache (bool) – Should we use the feature values from the cache - useful for Morgan and Combined attacks.

Return type:

Trained attack model, usually a binary classifier.

abstract transform_attack_data(target_model, X_attack, y_attack, split_type=None, use_cache=False)[source]#

This is the central method in designing the attack, and captures the attacker’s hypothesis about the membership of a data point in the training dataset of the target model. Its job is to derive signals from the original data that might be relevant to determining membership. Takes a dataset in the original format and converts it to the input variable for the attack. Think of it as feature engineering for building the attack model, which is essentially a binary classifier.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model being attacked.

  • X_attack ({array-like, sparse matrix} of shape (n_samples, n_features)) – Input features of the attack datapoints, where n_samples is the number of samples and n_features is the number of features.

  • y_attack (ndarray of shape (n_samples,)) – Vector containing the output labels of the attack data points (not membership label).

  • split_type (str) – Whether this is “train” set or “test” set, which is used for Morgan attack.

  • use_cache (bool) – Whether to use the cache or not.

Returns:

X_membership – Input features for the attack model, where n_samples is the number of samples and n_features is the number of features.

Return type:

{array-like, sparse matrix} of shape (n_samples, n_features)

Merlin Attack#

class guardian_ai.privacy_estimation.merlin_attack.MerlinAttack(attack_model, noise_type='gaussian', noise_coverage='full', noise_magnitude=0.01, max_t=50)[source]#

Implements the Merlin Attack as described in the paper: Revisiting Membership Inference Under Realistic Assumptions by Jayaraman et al. The main idea is to perturb a data point, and calculate noise on all the data points in this neighborhood. If the loss of large fraction of these points is above the target point, it might imply that the target point is in a local minima, and therefore the model might have fitted around it, implying it might have seen it at training time.

These default values are mostly taken from the original implementation of this attack.

Parameters:
  • attack_model (sklearn.base.BaseEstimator) – The type of attack model to be used. Typically, it’s ThresholdClassifier.

  • noise_type (str) – Choose the type of noise to add based on the data. Supports uniform and gaussian.

  • noise_coverage (str) – Add noise to all attributes (“full”) or only a subset.

  • noise_magnitude (float) – Size of the noise.

  • max_t (int) – The number of noisy points to generate to calculate the Merlin Ratio.

evaluate_attack(target_model, X_attack_test, y_attack_test, y_membership_test, metric_functions, print_roc_curve=False, cache_input=False, use_cache=False)#

Runs the attack against the target model, evaluates its accuracy and provides the metrics of interest on the success of the attack.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model being attacked.

  • X_attack_test ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the dataset on which to run the attack model. These are the original features (not attack/membership features).

  • y_attack_test (ndarray of shape (n_samples,)) – Output labels for the dataset on which to run the attack model. These are the original labels (not membership labels).

  • y_membership_test (ndarray of shape (n_samples,)) – Membership labels for the dataset on which we want to run the attack model. These are binary and indicate whether the data point was included in the training dataset of the target model, and helps us evaluate the attack model’s accuracy.

  • metric_functions (List[str]) – List of metric functions that we care about for evaluating the success of these attacks. Supports all sklearn.metrics that are relevant to binary classification, since the attack model is almost always a binary classifier.

  • print_roc_curve (bool, Defaults to False.) – Print out the values of the tpr and fpr. Only works for trained attack classifiers for now.

  • cache_input (bool, Defaults to False.) – Should we cache the input values - useful for expensive feature calculations like the merlin ratio.

  • use_cache (bool, Defaults to False.) – Should we use the feature values from the cache - useful for Morgan attack, which uses merlin ratio and loss values.

Returns:

Success metrics for the attack.

Return type:

List[float]

generate_noise(shape, dtype)[source]#

Generate noise to be added to the target data point.

Parameters:
  • shape (: np.shape) – Shape of the target data point

  • dtype (np.dtype) – Datatype of the target data point

Returns:

Noise generated according to the parameters to match the shape of the target.

Return type:

{array-like}

get_merlin_ratio(target_model, X_attack, y_attack)[source]#

Returns the merlin-ratio for the Merlin attack.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Model that is being targeted by the attack.

  • X_attack ({array-like, sparse matrix} of shape (n_samples, n_features)) – Input features of the attack datapoints, where n_samples is the number of samples and n_features is the number of features.

  • y_attack (ndarray of shape (n_samples,)) – Vector containing the output labels of the attack data points (not membership label).

Returns:

Merlin Ratio. Value between 0 and 1.

Return type:

float

perform_attack(target_model, X_attack, y_attack)#

Perform the actual attack. For now, this method would only be used in settings where the attacks themselves are being audited. Usually, we only call the evaluate_attack method.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model being attacked.

  • X_attack ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the attack points. These are the original features (not attack/membership features).

  • y_attack (ndarray of shape (n_samples,)) – Output labels for the attack points. These are the original labels (not membership labels).

Returns:

y_pred – Vector containing the Binary predictions on whether the attack points were part of the dataset used to train the target model.

Return type:

ndarray of shape (n_samples,)

train_attack_model(target_model, X_attack_train, y_attack_train, y_membership_train, threshold_grid=None, cache_input=False, use_cache=False)#

Takes the attack data points, transforms them into attack features and then trains the attack model using membership labels for those points. If a threshold grid is provided, it will simply tune the threshold using that grid, otherwise, it will train the model.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model that is being attacked.

  • X_attack_train ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the dataset on which we want to train the attack model. These are the original features (not attack/membership features).

  • y_attack_train (ndarray of shape (n_samples,)) – Output labels for the dataset on which we want to train the attack model. These are the original labels (not membership labels).

  • y_membership_train (ndarray of shape (n_samples,)) – Membership labels for the dataset on which we want to train the attack model. These are binary and indicate whether the data point was included in the training dataset of the target model.

  • threshold_grid (List[float]) – Threshold grid to use for tuning this model.

  • cache_input (bool) – Should we cache the input values - useful for expensive feature calculations like the merlin ratio.

  • use_cache (bool) – Should we use the feature values from the cache - useful for Morgan and Combined attacks.

Return type:

Trained attack model, usually a binary classifier.

transform_attack_data(target_model, X_attack, y_attack, split_type=None, use_cache=False)[source]#

Overriding the method transform_attack_data from the base class. Calculates the merlin ratio.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model being attacked.

  • X_attack ({array-like, sparse matrix} of shape (n_samples, n_features)) – Input features of the attack datapoints, where n_samples is the number of samples and n_features is the number of features.

  • y_attack (ndarray of shape (n_samples,)) – Vector containing the output labels of the attack data points (not membership label).

  • split_type (str) – Use information cached from running the loss based and merlin attacks.

  • use_cache (bool) – Using the cache or not.

Returns:

X_membership – where n_samples is the number of samples and n_features is the number of features. Input feature for the attack model - in this case, the Merlin ratio.

Return type:

{array-like, sparse matrix} of shape (n_samples, n_features),

Morgan Attack#

class guardian_ai.privacy_estimation.morgan_attack.MorganAttack(attack_model, loss_attack, merlin_attack)[source]#

Implements the Morgan Attack as described in the paper: Revisiting Membership Inference Under Realistic Assumptions by Jayaraman et al. The main idea is to combine the merlin ratio and per instance loss using multiple thresholds.

Initialize MorganAttack.

Parameters:
  • attack_model (sklearn.base.BaseEstimator) – Base attack model. Usually the Morgan Classifier.

  • loss_attack (guardian_ai.privacy_estimation.attack.LossBasedBlackBoxAttack) – Loss attack object.

  • merlin_attack (guardian_ai.privacy_estimation.merlin_attack.MerlinAttack) – Merlin attack object.

evaluate_attack(target_model, X_attack_test, y_attack_test, y_membership_test, metric_functions, print_roc_curve=False, cache_input=False, use_cache=False)#

Runs the attack against the target model, evaluates its accuracy and provides the metrics of interest on the success of the attack.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model being attacked.

  • X_attack_test ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the dataset on which to run the attack model. These are the original features (not attack/membership features).

  • y_attack_test (ndarray of shape (n_samples,)) – Output labels for the dataset on which to run the attack model. These are the original labels (not membership labels).

  • y_membership_test (ndarray of shape (n_samples,)) – Membership labels for the dataset on which we want to run the attack model. These are binary and indicate whether the data point was included in the training dataset of the target model, and helps us evaluate the attack model’s accuracy.

  • metric_functions (List[str]) – List of metric functions that we care about for evaluating the success of these attacks. Supports all sklearn.metrics that are relevant to binary classification, since the attack model is almost always a binary classifier.

  • print_roc_curve (bool, Defaults to False.) – Print out the values of the tpr and fpr. Only works for trained attack classifiers for now.

  • cache_input (bool, Defaults to False.) – Should we cache the input values - useful for expensive feature calculations like the merlin ratio.

  • use_cache (bool, Defaults to False.) – Should we use the feature values from the cache - useful for Morgan attack, which uses merlin ratio and loss values.

Returns:

Success metrics for the attack.

Return type:

List[float]

perform_attack(target_model, X_attack, y_attack)#

Perform the actual attack. For now, this method would only be used in settings where the attacks themselves are being audited. Usually, we only call the evaluate_attack method.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model being attacked.

  • X_attack ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the attack points. These are the original features (not attack/membership features).

  • y_attack (ndarray of shape (n_samples,)) – Output labels for the attack points. These are the original labels (not membership labels).

Returns:

y_pred – Vector containing the Binary predictions on whether the attack points were part of the dataset used to train the target model.

Return type:

ndarray of shape (n_samples,)

train_attack_model(target_model, X_attack_train, y_attack_train, y_membership_train, threshold_grid=None, cache_input=False, use_cache=False)#

Takes the attack data points, transforms them into attack features and then trains the attack model using membership labels for those points. If a threshold grid is provided, it will simply tune the threshold using that grid, otherwise, it will train the model.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model that is being attacked.

  • X_attack_train ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the dataset on which we want to train the attack model. These are the original features (not attack/membership features).

  • y_attack_train (ndarray of shape (n_samples,)) – Output labels for the dataset on which we want to train the attack model. These are the original labels (not membership labels).

  • y_membership_train (ndarray of shape (n_samples,)) – Membership labels for the dataset on which we want to train the attack model. These are binary and indicate whether the data point was included in the training dataset of the target model.

  • threshold_grid (List[float]) – Threshold grid to use for tuning this model.

  • cache_input (bool) – Should we cache the input values - useful for expensive feature calculations like the merlin ratio.

  • use_cache (bool) – Should we use the feature values from the cache - useful for Morgan and Combined attacks.

Return type:

Trained attack model, usually a binary classifier.

transform_attack_data(target_model, X_attack, y_attack, split_type=None, use_cache=False)[source]#

Overriding the method transform_attack_data from the base class. Calculates the Merlin ratio, and combines it with per instance loss.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model) – Target model being attacked.

  • X_attack ({array-like, sparse matrix} of shape (n_samples, n_features)) – Input features of the attack datapoints, where n_samples is the number of samples and n_features is the number of features.

  • y_attack (ndarray of shape (n_samples,)) – Vector containing the output labels of the attack data points (not membership label).

  • split_type (str) – Use information cached from running the loss based and merlin attacks.

  • use_cache (bool) – Using the cache or not.

Returns:

X_membership – where n_samples is the number of samples and n_features is the number of features. Input feature for the attack model - in this case the Merlin ratio and per-instance loss.

Return type:

{array-like, sparse matrix} of shape (n_samples, n_features),

Combined Attack#

class guardian_ai.privacy_estimation.combined_attacks.CombinedBlackBoxAttack(attack_model, loss_attack=None, confidence_attack=None)[source]#

Similar in spirit to the Morgan attack, which combines loss and the merlin ratio. In this attack, we combine loss, and confidence values and instead of tuning the thresholds, we combine them using a trained classifier, like stacking.

Initialize CombinedBlackBoxAttack.

Parameters:
  • attack_model (sklearn.base.BaseEstimator) –

  • loss_attack (guardian_ai.privacy_estimation.attack.LossBasedBlackBoxAttack) –

  • confidence_attack (guardian_ai.privacy_estimation.attack.ConfidenceBasedBlackBoxAttack) –

evaluate_attack(target_model, X_attack_test, y_attack_test, y_membership_test, metric_functions, print_roc_curve=False, cache_input=False, use_cache=False)#

Runs the attack against the target model, evaluates its accuracy and provides the metrics of interest on the success of the attack.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model being attacked.

  • X_attack_test ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the dataset on which to run the attack model. These are the original features (not attack/membership features).

  • y_attack_test (ndarray of shape (n_samples,)) – Output labels for the dataset on which to run the attack model. These are the original labels (not membership labels).

  • y_membership_test (ndarray of shape (n_samples,)) – Membership labels for the dataset on which we want to run the attack model. These are binary and indicate whether the data point was included in the training dataset of the target model, and helps us evaluate the attack model’s accuracy.

  • metric_functions (List[str]) – List of metric functions that we care about for evaluating the success of these attacks. Supports all sklearn.metrics that are relevant to binary classification, since the attack model is almost always a binary classifier.

  • print_roc_curve (bool, Defaults to False.) – Print out the values of the tpr and fpr. Only works for trained attack classifiers for now.

  • cache_input (bool, Defaults to False.) – Should we cache the input values - useful for expensive feature calculations like the merlin ratio.

  • use_cache (bool, Defaults to False.) – Should we use the feature values from the cache - useful for Morgan attack, which uses merlin ratio and loss values.

Returns:

Success metrics for the attack.

Return type:

List[float]

perform_attack(target_model, X_attack, y_attack)#

Perform the actual attack. For now, this method would only be used in settings where the attacks themselves are being audited. Usually, we only call the evaluate_attack method.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model being attacked.

  • X_attack ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the attack points. These are the original features (not attack/membership features).

  • y_attack (ndarray of shape (n_samples,)) – Output labels for the attack points. These are the original labels (not membership labels).

Returns:

y_pred – Vector containing the Binary predictions on whether the attack points were part of the dataset used to train the target model.

Return type:

ndarray of shape (n_samples,)

train_attack_model(target_model, X_attack_train, y_attack_train, y_membership_train, threshold_grid=None, cache_input=False, use_cache=False)#

Takes the attack data points, transforms them into attack features and then trains the attack model using membership labels for those points. If a threshold grid is provided, it will simply tune the threshold using that grid, otherwise, it will train the model.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model that is being attacked.

  • X_attack_train ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input variables for the dataset on which we want to train the attack model. These are the original features (not attack/membership features).

  • y_attack_train (ndarray of shape (n_samples,)) – Output labels for the dataset on which we want to train the attack model. These are the original labels (not membership labels).

  • y_membership_train (ndarray of shape (n_samples,)) – Membership labels for the dataset on which we want to train the attack model. These are binary and indicate whether the data point was included in the training dataset of the target model.

  • threshold_grid (List[float]) – Threshold grid to use for tuning this model.

  • cache_input (bool) – Should we cache the input values - useful for expensive feature calculations like the merlin ratio.

  • use_cache (bool) – Should we use the feature values from the cache - useful for Morgan and Combined attacks.

Return type:

Trained attack model, usually a binary classifier.

transform_attack_data(target_model, X_attack, y_attack, split_type=None, use_cache=False)[source]#

Overriding the method transform_attack_data from the base class. Calculates the per instance loss and confidence.

Parameters:
  • target_model (guardian_ai.privacy_estimation.model.TargetModel) – Target model being attacked.

  • X_attack ({array-like, sparse matrix} of shape (n_samples, n_features)) – Input features of the attack datapoints, where n_samples is the number of samples and n_features is the number of features.

  • y_attack (ndarray of shape (n_samples,)) – Vector containing the output labels of the attack data points (not membership label).

  • split_type (str) – Use information cached from running the loss based and merlin attacks

  • use_cache (bool) – Using the cache or not

Returns:

X_membership – where n_samples is the number of samples and n_features is the number of features. Input feature for the attack model - in this case, per-instance loss and confidence values

Return type:

{array-like, sparse matrix} of shape (n_samples, n_features),

Attack Tuner#

class guardian_ai.privacy_estimation.attack_tuner.AttackTuner[source]#
print_dataframe(filtered_cv_results)[source]#

Pretty print for filtered dataframe

Parameters:

filtered_cv_results (dict) – Dictionary record filtered results.

Return type:

None

refit_strategy(cv_results)[source]#

Define the strategy to select the best estimator.

The strategy defined here is to filter-out all results below a precision threshold of 0.98, rank the remaining by recall and keep all models with one standard deviation of the best by recall. Once these models are selected, we can select the fastest model to predict.

Parameters:

cv_results (dict of numpy (masked) ndarrays) – CV results as returned by the GridSearchCV.

Returns:

best_index – The index of the best estimator as it appears in cv_results.

Return type:

int

refit_strategy_f1(cv_results)[source]#

Define the strategy to select the best estimator.

The strategy defined here is to filter-out all results below a precision threshold of 0.5, rank the remaining by f1, and get the model with best f1

Parameters:

cv_results (dict of numpy (masked) ndarrays) – CV results as returned by the GridSearchCV.

Returns:

best_index – The index of the best estimator as it appears in cv_results.

Return type:

int

tune_attack(classifier, X_train, y_train, threshold_grid)[source]#

Tune a threshold based attack over a given grid.

Parameters:
  • classifier (ThresholdClassifier) – Threshold based classifier.

  • X_train ({array-like, sparse matrix} of shape (n_samples, n_features),) – where n_samples is the number of samples and n_features is the number of features. Input features for the set on which the attack is trained.

  • y_train (ndarray of shape (n_samples,)) – Output labels for the set on which the attack is trained.

  • threshold_grid (List[float]) – Grid to search over

Returns:

Best parameters (in this case, threshold).

Return type:

float

Attack Runner#

class guardian_ai.privacy_estimation.attack_runner.AttackRunner(dataset, target_models, attacks, threshold_grids)[source]#

Class that can run the specified attacks against specified target models using the given dataset

Initialize AttackRunner.

Parameters:
  • dataset (ClassificationDataset) – Dataset that has been split and prepared for running the attacks

  • target_models (List[TargetModel]) – Target models to run the attacks against

  • attacks (Dict[str:List[float]],) – List of attacks to run. Use the pattern AttackType.LossBasedBlackBoxAttack.name

Return type:

AttackRunner

run_attack(target_model, attack_type, metric_functions, print_roc_curve=False, cache_input=False)[source]#

Instantiate the specified attack, trains and evaluates it, and prints out the result of the attack to an output result file, if provided.

Parameters:
  • target_model (TargetModel) – Target model being attacked.

  • attack_type (AttackType) – Type of the attack to run

  • metric_functions (List[str]) – List of metric functions that we care about for evaluating the success of these attacks. Supports all sklearn.metrics that are relevant to binary classification, since the attack model is almost always a binary classifier.

  • print_roc_curve (bool) – Print out the values of the tpr and fpr. Only works for trained attack classifiers for now.

  • ache_input (bool) – Should we cache the input values - useful for expensive feature calculations like the merlin ratio.

  • cache_input (bool) –

Returns:

Result string

Return type:

str