Module leapyear.model¶
LeapYear models.
Data objects generated from training or evaluating models used in machine learning.
RegressionBased Models¶

class
leapyear.model.
GLM
(affinity: bool, l1reg: float, l2reg: float, model: GeneralizedLinearModel)¶ A representation of a trained Generalized Linear Model (GLM).
Differentially private versions of GLMs are calibrated using various methods, e.g.
leapyear.analytics.linreg()
,The variants of these methods that optimize model hyperparameters.
Objects of this class store parameters and structure of a regression model and can be used to generate predictions for regression and classification problems.

property
affinity
¶ Alias for field number 0

property
l1reg
¶ Alias for field number 1

property
l2reg
¶ Alias for field number 2

property
model
¶ Alias for field number 3

property
coefficients
¶ Model coefficients, excluding intercepts.
 Return type
ndarray

property
intercepts
¶ Model intercepts, if any.
 Return type
ndarray

property
model_type
¶ Model type (e.g. linear, logistic).
 Return type
GeneralizedLinearModelType

decision_function
(xs)¶ Decision function of the generalized linear model.
Computes the height of the regression function (xbeta) at the provided points. This is purely linear transformation of the input features.
In case of logistic model, model would ultimately classify observations based on the sign of this decision function.
 Parameters
xs (
ndarray
) – a set of datapoints for which to predict Returns
The predicted decision function
 Return type
np.ndarray

predict
(xs)¶ Prediction function of the generalized linear model.
For linear problems, returns the height of the regression line (decision function) at the data points provided.
For classification problems, returns boolean classification choice, which is based on the sign of this decision function.
 Parameters
xs (
ndarray
) – a set of datapoints for which to predict Returns
the predictions for the points according to the model
 Return type
np.ndarray

predict_proba
(xs)¶ Probabilities given by generalized linear model.
For logistic classification problems, returns probability that the model assigns to a positive response (True outcome variable) for each of the data points provided.
 Parameters
xs (
ndarray
) – array with input data Returns
array of probability scores assigned by the model
 Return type
np.ndarray

predict_log_proba
(xs)¶ Logarithm of probabilities given by generalized linear model.
For logistic classification problems, returns natural logarithm of probability that the model assigns to a True outcome for each of the data points provided.
 Parameters
xs (
ndarray
) – array with input data Returns
array of logprobability scores assigned by the model
 Return type
np.ndarray

classmethod
from_dict
(d)¶ Convert from a dictionary.
 Return type

to_shap
()¶ Convert the trained model to SHAP format.
The converted model can then be used to construct a LinearExplainer object able to generate Shapley explanations for new records to which the model would be applied to.
Note that:
model execution and generation of model score explanations is expected to be done in a production setting by an automated system with direct access to recordlevel information.
feature explanations for categorical features are currently not supported. Consider onehot encoding features to get the benefits of explainable model scores.
Examples
>>> import shap >>> from leapyear import analytics as la >>> ... >>> glm_model = la.logreg(xs, y, ds).run() >>> glm_explainer = shap.LinearExplainer(glm_model.to_shap(), X_reference) >>> glm_shap_values = glm_explainer.shap_values(X_to_predict) >>> ...
In this example:
The input X_reference used to initialize the explainer object is a pandas.DataFrame containing explanatory variables in the same order as used to train models. It is used to infer what model scores and feature distribution should be considered “typical”.
The input X_to_predict is a pandas.DataFrame capturing the explanatory variables in the same order as used to train models.
See https://shap.readthedocs.io/en/latest/generated/shap.explainers.Linear.html
LeapYear has been tested with SHAP version 0.39.0. Older or newer versions are not guaranteed to work.
 Return type
_ShapGLM
TreeBased Models¶

class
leapyear.model.
RandomForestClassifier
(ntrees: int, height: int, model: DecisionForest)¶ A representation of a trained Random Forest classification model.
Provides methods for making predictions and report on feature importance statistics.

property
ntrees
¶ Alias for field number 0

property
height
¶ Alias for field number 1

property
model
¶ Alias for field number 2

predict
(xs)¶ Prediction function of the random forest classification model.
For classification problems, returns the most likely class according to the model.
 Parameters
xs (
ndarray
) – array with input data Returns
array of most likely outcome labels assigned by the model
 Return type
np.ndarray

predict_proba
(xs)¶ Prediction probability function of the random forest model.
For each of the data points provided, returns probability that the model assigns to any given outcome.
 Parameters
xs (
ndarray
) – array with input data Returns
array of probability scores assigned by the model to input data points and possible outcomes
 Return type
np.ndarray

predict_log_proba
(xs)¶ Logarithm of probabilities given by random forest model.
For each of the data points provided, returns natural logarithm of probability that the model assigns to any given outcome.
 Parameters
xs (
ndarray
) – array with input data Returns
array of logprobability scores assigned by the model to input data points and possible outcomes
 Return type
np.ndarray

property
feature_importance
¶ Relative feature importance.
Feature importances are derived based on the information collected during model training with differentially private computations, specifically:
For each tree and for each split of the tree, lookup value (gain) of introducing the split, as calculated on training data during model calibration  and attribute it to the splitting feature. See
leapyear.analytics.random_forest()
for specific calculation of split gain based on a notion of Gini impurity.To compute treespecific feature importances, sum up split gains across all splits within each tree, weighted (multiplied) by parent node size, and rescale these treespecific feature importances to sum up to 1 for each tree.
Average feature importances across all trees in the random forest ensemble to get final feature importance.
 References:
Hastie, Tibshirani, Friedman. “The Elements of Statistical Learning, 2nd Edition.” 2001.

classmethod
from_dict
(d)¶ Convert from a dictionary.
 Return type

to_shap
()¶ Convert the trained model to SHAP format.
The converted model can then be used to construct a TreeExplainer object able to generate Shapley explanations for new records to which the model would be applied to.
Note that:
model execution and generation of model score explanations is expected to be done in a production setting by an automated system with direct access to recordlevel information.
feature explanations for categorical features are currently not supported. Consider onehot encoding features to get the benefits of explainable model scores.
Examples
>>> import shap >>> from leapyear import analytics as la >>> ... >>> rfc_model = la.random_forest (xs, y, ds).run() >>> rfc_explainer = shap.TreeExplainer(rfc_model.to_shap(), X_reference) >>> rfc_shap_values = rfc_explainer.shap_values(X_to_predict) >>> ...
In this example:
The input X_reference used to initialize the explainer object is a pandas.DataFrame containing explanatory variables in the same order as used to train models. It is used to infer what model scores and feature distribution should be considered “typical”.
The input X_to_predict is a pandas.DataFrame capturing the explanatory variables in the same order as used to train models.
See https://shap.readthedocs.io/en/latest/generated/shap.explainers.Tree.html
LeapYear has been tested with SHAP version 0.39.0. Older or newer versions are not guaranteed to work.
 Return type

property

class
leapyear.model.
RandomForestRegressor
(ntrees: int, height: int, model: DecisionForest)¶ A representation of a trained Random Forest regression model.

property
ntrees
¶ Alias for field number 0

property
height
¶ Alias for field number 1

property
model
¶ Alias for field number 2

predict
(xs)¶ Prediction function of the random forest regression model.
For each of the data points provided, returns the prediction that the model assigns.
 Parameters
xs (
ndarray
) – array with input data Returns
array of predictions assigned by the model to input data points
 Return type
np.ndarray

classmethod
from_dict
(d)¶ Convert from a dictionary.
 Return type

to_shap
()¶ Convert the trained model to SHAP format.
The converted model can then be used to construct a TreeExplainer object able to generate Shapley explanations for new records to which the model would be applied to.
Note that:
model execution and generation of model score explanations is expected to be done in a production setting by an automated system with direct access to recordlevel information.
feature explanations for categorical features are currently not supported. Consider onehot encoding features to get the benefits of explainable model scores.
Examples
>>> import shap >>> from leapyear import analytics as la >>> ... >>> rf_model = la.regression_trees(xs, y, ds).run() >>> rf_explainer = shap.TreeExplainer(rf_model.to_shap(), X_reference) >>> rf_shap_values = rf_explainer.shap_values(X_to_predict) >>> ...
In this example:
The input X_reference used to initialize the explainer object is a pandas.DataFrame containing explanatory variables in the same order as used to train models. It is used to infer what model scores and feature distribution should be considered “typical”.
The input X_to_predict is a pandas.DataFrame capturing the explanatory variables in the same order as used to train models.
See https://shap.readthedocs.io/en/latest/generated/shap.explainers.Tree.html
LeapYear has been tested with SHAP version 0.39.0. Older or newer versions are not guaranteed to work.
 Return type

property

class
leapyear.model.
GradientBoostedTreeClassifier
(max_depth: int, model: WeightedDecisionForest)¶ A representation of a trained gradient boosted tree classifier model.
This includes two named fields:
max_depth
 the maximum depth of the individual decision trees.model
 a model object of classWeightedDecisionForest
, including information about individual decision trees and their weights.

property
max_depth
¶ Alias for field number 0

property
model
¶ Alias for field number 1

predict
(xs)¶ Prediction function of the gradient boosted tree classification model.
For classification problems, returns the most likely class according to the model.
 Parameters
xs (
ndarray
) – array with input data Returns
array of most likely outcome labels assigned by the model
 Return type
np.ndarray

predict_proba
(xs)¶ Prediction probability function of the GBT model.
For each of the data points provided, returns probability that the model assigns to any given outcome.
 Parameters
xs (
ndarray
) – array with input data Returns
array of probability scores assigned by the model to input data points and possible outcomes
 Return type
np.ndarray

predict_log_proba
(xs)¶ Logarithm of probabilities given by GBT model.
For each of the data points provided, returns natural logarithm of probability that the model assigns to any given outcome.
 Parameters
xs (
ndarray
) – array with input data Returns
array of logprobability scores assigned by the model to input data points and possible outcomes
 Return type
np.ndarray

classmethod
from_dict
(d)¶ Convert from a dictionary.
 Return type

to_shap
()¶ Convert the trained model to SHAP format.
The converted model can then be used to construct a TreeExplainer object able to generate Shapley explanations for new records to which the model would be applied to.
Note that:
model execution and generation of model score explanations is expected to be done in a production setting by an automated system with direct access to recordlevel information.
feature explanations for categorical features are currently not supported. Consider onehot encoding features to get the benefits of explainable model scores.
Examples
>>> import shap >>> from leapyear import analytics as la >>> ... >>> gbt_model = la.gradient_boosted_tree_classifier(xs, y, ds).run() >>> gbt_explainer = shap.TreeExplainer(gbt_model.to_shap(), X_reference) >>> gbt_shap_values = gbt_explainer.shap_values(X_to_predict) >>> ...
In this example:
The input X_reference used to initialize the explainer object is a pandas.DataFrame containing explanatory variables in the same order as used to train models. It is used to infer what model scores and feature distribution should be considered “typical”.
The input X_to_predict is a pandas.DataFrame capturing the explanatory variables in the same order as used to train models.
See https://shap.readthedocs.io/en/latest/generated/shap.explainers.Tree.html
LeapYear has been tested with SHAP version 0.39.0. Older or newer versions are not guaranteed to work.
 Return type
Clustering Models¶

class
leapyear.model.
ClusterModel
(niters: int, nclusters: int, model: _CM)¶ A representation of the trained Kmeans clustering model.
This model is generated by running a Kmeans clustering algorithm
leapyear.analytics.kmeans()
and contains cluster centroids (centers).
property
niters
¶ Alias for field number 0

property
nclusters
¶ Alias for field number 1

property
model
¶ Alias for field number 2

predict
(xs)¶ Prediction function of the clustering model.
Returns the labels for each point in xs.
 Parameters
xs (
ndarray
) – A 2dimensional array of data points. Returns
The associated cluster labels predicted by the the model.
 Return type
np.ndarray

classmethod
from_dict
(d)¶ Convert from a dictionary.
 Return type

property
Model Evaluation Objects¶

class
leapyear.model.
ConfusionCurve
(model: _CC)¶ The Confusion curve object.
This model is generated from running
leapyear.analytics.roc()
and contains the metrics of true positive, false positive, true negative and false negative rates for a sequence of thresholds. Other common metrics are provided as properties of this model.
property
model
¶ Alias for field number 0

property
df
¶ Return a dataframe containing most of the analytics.
 Return type
DataFrame

property
thresholds
¶ Thresholds.
Outputs the list of thresholds used for generating confusion curve.
 Return type
ndarray

property
tpr
¶ Compute true positive rates.
Outputs a list of true positive rate (sensitivity, recall) values, associated with chosen thresholds.
Aliases:
tpr
(true positive rate),sensitivity
,recall
See also
 Sensitivity and specificity:
 Return type
ndarray

property
sensitivity
¶ Compute true positive rates.
Outputs a list of true positive rate (sensitivity, recall) values, associated with chosen thresholds.
Aliases:
tpr
(true positive rate),sensitivity
,recall
See also
 Sensitivity and specificity:
 Return type
ndarray

property
recall
¶ Compute true positive rates.
Outputs a list of true positive rate (sensitivity, recall) values, associated with chosen thresholds.
Aliases:
tpr
(true positive rate),sensitivity
,recall
See also
 Sensitivity and specificity:
 Return type
ndarray

property
fpr
¶ Compute false positive rates.
Outputs a list of false positive rate (fallout) values, associated with chosen thresholds.
Aliases:
fpr
(false positive rate),fallout
See also
 False positive rate:
 Return type
ndarray

property
fallout
¶ Compute false positive rates.
Outputs a list of false positive rate (fallout) values, associated with chosen thresholds.
Aliases:
fpr
(false positive rate),fallout
See also
 False positive rate:
 Return type
ndarray

property
tnr
¶ Compute true negative rates.
Outputs a list of true negative rate (specificity) values, associated with chosen thresholds.
Aliases:
tnr
(true negative rate),specificity
See also
 Sensitivity and specificity:
 Return type
ndarray

property
specificity
¶ Compute true negative rates.
Outputs a list of true negative rate (specificity) values, associated with chosen thresholds.
Aliases:
tnr
(true negative rate),specificity
See also
 Sensitivity and specificity:
 Return type
ndarray

property
fnr
¶ Compute false negative rates.
Outputs a list of false negative rate (miss rate) values, associated with chosen thresholds.
Aliases:
fnr
(false negative rate),missrate
See also
 False negative rates:
 Return type
ndarray

property
missrate
¶ Compute false negative rates.
Outputs a list of false negative rate (miss rate) values, associated with chosen thresholds.
Aliases:
fnr
(false negative rate),missrate
See also
 False negative rates:
 Return type
ndarray

property
precision
¶ Compute precision.
Aliases:
precision
,ppv
(positive predictive value)See also
 Precision and recall:
 Return type
ndarray

property
ppv
¶ Compute precision.
Aliases:
precision
,ppv
(positive predictive value)See also
 Precision and recall:
 Return type
ndarray

property
npv
¶ Negative predictive value.
 Return type
ndarray

property
accuracy
¶ Accuracy.
 Return type
ndarray

property
f1score
¶ F1score.
 Return type
ndarray

property
mcc
¶ Matthews correlation coefficient.
 Return type
ndarray

property
auc_roc
¶ Area under the ROC curve.
Calculates the area under Receiver Operating Characteristic (ROC) curve.
 Return type
ndarray

property
auc_pr
¶ Area under the PrecisionRecall curve.
 Return type
ndarray

property
gmeasure
¶ the geometric mean of the precision and recall.
 Type
Gmeasure
 Return type
ndarray

property