# Module leapyear.model¶

LeapYear models.

Data objects generated from training or evaluating models used in machine learning.

## Regression-Based Models¶

class leapyear.model.GLM

A representation of a trained Generalized Linear Model (GLM).

Differentially private versions of GLMs are calibrated using various methods, e.g.

• The variants of these methods that optimize model hyperparameters.

Objects of this class store parameters and structure of a regression model and can be used to generate predictions for regression and classification problems.

property coefficients

Model coefficients, excluding intercepts.

Return type

ndarray

property intercept

Model intercept, if model has only one coefficient set.

Return type

float

property intercepts

Model intercepts, if any.

Return type

ndarray

property model_type

Model type (e.g. linear, logistic).

Return type

GeneralizedLinearModelType

decision_function(xs)

Decision function of the generalized linear model.

Computes the height of the regression function (xbeta) at the provided points. This is purely linear transformation of the input features.

In case of logistic model, model would ultimately classify observations based on the sign of this decision function.

Parameters

xs (ndarray) – a set of datapoints for which to predict

Returns

The predicted decision function

Return type

np.ndarray

predict(xs)

Prediction function of the generalized linear model.

For linear problems, returns the height of the regression line (decision function) at the data points provided.

For classification problems, returns boolean classification choice, which is based on the sign of this decision function.

Parameters

xs (ndarray) – a set of datapoints for which to predict

Returns

the predictions for the points according to the model

Return type

np.ndarray

predict_proba(xs)

Probabilities given by generalized linear model.

For logistic classification problems, returns probability that the model assigns to a positive response (True outcome variable) for each of the data points provided.

Parameters

xs (ndarray) – array with input data

Returns

array of probability scores assigned by the model

Return type

np.ndarray

predict_log_proba(xs)

Logarithm of probabilities given by generalized linear model.

For logistic classification problems, returns natural logarithm of probability that the model assigns to a True outcome for each of the data points provided.

Parameters

xs (ndarray) – array with input data

Returns

array of log-probability scores assigned by the model

Return type

np.ndarray

to_dict()

Convert to a dictionary.

Return type
classmethod from_dict(d)

Convert from a dictionary.

Return type

GLM

## Tree-Based Models¶

class leapyear.model.RandomForest

A representation of a trained Random Forest Model.

Provides methods for making predictions and report on feature importance statistics.

predict(xs)

Prediction function of the random forest classification model.

For classification problems, returns the most likely class according to the model.

Parameters

xs (ndarray) – array with input data

Returns

array of most likely outcome labels assigned by the model

Return type

np.ndarray

predict_proba(xs)

Prediction probability function of the random forest model.

For each of the data points provided, returns probability that the model assigns to any given outcome.

Parameters

xs (ndarray) – array with input data

Returns

array of probability scores assigned by the model to input data points and possible outcomes

Return type

np.ndarray

predict_log_proba(xs)

Logarithm of probabilities given by random forest model.

For each of the data points provided, returns natural logarithm of probability that the model assigns to any given outcome.

Parameters

xs (ndarray) – array with input data

Returns

array of log-probability scores assigned by the model to input data points and possible outcomes

Return type

np.ndarray

property feature_importance

Relative feature importance.

Feature importances are derived based on the information collected during model training with differentially private computations, specifically:

1. For each tree and for each split of the tree, lookup value (gain) of introducing the split, as calculated on training data during model calibration - and attribute it to the splitting feature. See leapyear.analytics.random_forest() for specific calculation of split gain based on a notion of Gini impurity.

2. To compute tree-specific feature importances, sum up split gains across all splits within each tree, weighted (multiplied) by parent node size, and re-scale these tree-specific feature importances to sum up to 1 for each tree.

3. Average feature importances across all trees in the random forest ensemble to get final feature importance.

References:
• Hastie, Tibshirani, Friedman. “The Elements of Statistical Learning, 2nd Edition.” 2001.

Return type
to_dict()

Convert to a dictionary.

Return type
classmethod from_dict(d)

Convert from a dictionary.

Return type

RandomForest

class leapyear.model.GradientBoostedTreeClassifier

A representation of a trained gradient boosted tree classifier model.

This includes two named fields:

• max_depth - the maximum depth of the individual decision trees.

• model - a model object of class WeightedDecisionForest, including information about individual decision trees and their weights.

to_dict()

Convert to a dictionary.

Return type
classmethod from_dict(d)

Convert from a dictionary.

Return type

GradientBoostedTreeClassifier

## Clustering Models¶

class leapyear.model.ClusterModel

A representation of the trained K-means clustering model.

This model is generated by running a K-means clustering algorithm leapyear.analytics.kmeans() and contains cluster centroids (centers).

property centroids

Model centroids.

Return type

ndarray

predict(xs)

Prediction function of the clustering model.

Returns the labels for each point in xs.

Parameters

xs (ndarray) – A 2-dimensional array of data points.

Returns

The associated cluster labels predicted by the the model.

Return type

np.ndarray

to_dict()

Convert to a dictionary.

Return type
classmethod from_dict(d)

Convert from a dictionary.

Return type

ClusterModel

## Model Evaluation Objects¶

class leapyear.model.ConfusionCurve

The Confusion curve object.

This model is generated from running leapyear.analytics.roc() and contains the metrics of true positive, false positive, true negative and false negative rates for a sequence of thresholds. Other common metrics are provided as properties of this model.

property thresholds

Thresholds.

Outputs the list of thresholds used for generating confusion curve.

Return type

ndarray

property tpr

Compute true positive rates.

Outputs a list of true positive rate (sensitivity, recall) values, associated with chosen thresholds.

Return type

ndarray

property sensitivity

Compute true positive rates.

Outputs a list of true positive rate (sensitivity, recall) values, associated with chosen thresholds.

Return type

ndarray

property recall

Compute true positive rates.

Outputs a list of true positive rate (sensitivity, recall) values, associated with chosen thresholds.

Return type

ndarray

property fpr

Compute false positive rates.

Outputs a list of false positive rate (fallout) values, associated with chosen thresholds.

Return type

ndarray

property fallout

Compute false positive rates.

Outputs a list of false positive rate (fallout) values, associated with chosen thresholds.

Return type

ndarray

property tnr

Compute true negative rates.

Outputs a list of true negative rate (specificity) values, associated with chosen thresholds.

Return type

ndarray

property specificity

Compute true negative rates.

Outputs a list of true negative rate (specificity) values, associated with chosen thresholds.

Return type

ndarray

property fnr

Compute false negative rates.

Outputs a list of false negative rate values, associated with chosen thresholds.

Return type

ndarray

property missrate

Compute false negative rates.

Outputs a list of false negative rate values, associated with chosen thresholds.

Return type

ndarray

property precision

Precision or positive predictive value.

Return type

ndarray

property ppv

Precision or positive predictive value.

Return type

ndarray

property npv

Negative predictive value.

Return type

ndarray

property accuracy

Accuracy.

Return type

ndarray

property f1score

F1-score.

Return type

ndarray

property mcc

Matthews correlation coefficient.

Return type

ndarray

property auc_roc

Area under the ROC curve.

Calculates the area under Receiver Operating Characteristic (ROC) curve.

Return type

ndarray

property auc_pr

Area under the Precision-Recall curve.

Return type

ndarray

property gmeasure

the geometric mean of the precision and recall.

Type

G-measure

Return type

ndarray

fscore(beta)

Fbeta-score.

The Fbeta-score is the weighted harmonic mean between the precision and recall.

Parameters

beta (float) – Non-negative float for the relative proportion of precision and recall.

Returns

Return type

The Fbeta score