Metrics are measures of quantitative assessment commonly used for estimating, comparing, and tracking performance or production. Generally, a group of metrics will typically be used to build a dashboard that management or analysts review on a regular basis to maintain performance assessments, opinions, and business strategies.
- watex.metrics.ROC_curve(roc_kws=None, **tradeoff_kws)[source]#
The Receiving Operating Characteric (ROC) curve is another common tool used with binary classifiers.
It’s very similar to precision/recall , but instead of plotting precision versus recall, the ROC curve plots the true positive rate (TNR)another name for recall) against the false positive rate`(FPR). The FPR is the ratio of negative instances that are correctly classified as positive.It is equal to one minus the TNR, which is the ratio of negative isinstance that are correctly classified as negative. The TNR is also called `specify. Hence the ROC curve plot sensitivity (recall) versus 1-specifity.
- Parameters:
clf (callable, always as a function, classifier estimator) –
A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:
* fit * predict * score
It may also be appropriate to implement decision_function, predict_proba and predict_log_proba.
X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.
Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
cv (float,) –
A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:
* An integer, specifying the number of folds in K-fold cross validation. K-fold will be stratified over classes if the estimator is a classifier (determined by base.is_classifier) and the targets may represent a binary or multiclass (but not multioutput) classification problem (determined by utils.multiclass.type_of_target). * A cross-validation splitter instance. Refer to the User Guide for splitters available within `Scikit-learn`_ * An iterable yielding train/test splits.- With some exceptions (especially where not using cross validation at all
is an option), the default is
4-fold.
label (float, int) – Specific class to evaluate the tradeoff of precision and recall. If y is already a binary classifer (0 & 1), label does need to specify.
method (str) – Method to get scores from each instance in the trainset. Could be a
decison_funcionorpredict_proba. When using the scikit-Learn classifier, it generally has one of the method. Default isdecision_function.tradeoff (float) – check your precision score and recall score with a specific tradeoff. Suppose to get a precision of 90%, you might specify a tradeoff and get the precision score and recall score by setting a y-tradeoff value.
roc_kws (dict) – roc_curve additional keywords arguments
See also
watex.view.mlplot.MLPlot.precisionRecallTradeoffplot consistency precision recall curve.
- Returns:
obj – The metric object hold the following attributes additional to the return attributes from :func:~.precision_recall_tradeoff`:
* `roc_auc_score` for area under the curve * `fpr` for false positive rate * `tpr` for true positive rate * `thresholds` from `roc_curve` * `y` classified
and can be retrieved for plot purpose.
- Return type:
object, an instancied metric tying object
Examples
>>> from watex.exlib import SGDClassifier >>> from watex.metrics import ROC_curve >>> from watex.datasets import fetch_data >>> X, y= fetch_data('Bagoue prepared') >>> rocObj =ROC_curve(clf = sgd_clf, X= X, y = y, classe_=1, cv=3 ) >>> rocObj.__dict__.keys() >>> rocObj.roc_auc_score >>> rocObj.fpr
- watex.metrics.confusion_matrix(clf, X, y, *, cv=7, plot_conf_max=False, crossvalp_kws={}, **conf_mx_kws)[source]#
Evaluate the preformance of the model or classifier by counting the number of the times instances of class A are classified in class B.
To compute a confusion matrix, you need first to have a set of prediction, so they can be compared to the actual targets. You could make a prediction using the test set, but it’s better to keep it untouch since you are not ready to make your final prediction. Remember that we use the test set only at very end of the project, once you have a classifier that you are ready to lauchn instead. The confusion metric give a lot of information but sometimes we may prefer a more concise metric like precision and recall metrics.
- Parameters:
clf (callable, always as a function, classifier estimator) –
A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:
* fit * predict * score
It may also be appropriate to implement decision_function, predict_proba and predict_log_proba.
X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.
Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
cv (float,) –
A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:
* An integer, specifying the number of folds in K-fold cross validation. K-fold will be stratified over classes if the estimator is a classifier (determined by base.is_classifier) and the targets may represent a binary or multiclass (but not multioutput) classification problem (determined by utils.multiclass.type_of_target). * A cross-validation splitter instance. Refer to the User Guide for splitters available within `Scikit-learn`_ * An iterable yielding train/test splits.- With some exceptions (especially where not using cross validation at all
is an option), the default is
4-fold.
label (float, int) – Specific class to evaluate the tradeoff of precision and recall. If y is already a binary classifer (0 & 1), label does need to specify.
method (str) – Method to get scores from each instance in the trainset. Could be a
decison_funcionorpredict_proba. When using the scikit-Learn classifier, it generally has one of the method. Default isdecision_function.tradeoff (float) – check your precision score and recall score with a specific tradeoff. Suppose to get a precision of 90%, you might specify a tradeoff and get the precision score and recall score by setting a y-tradeoff value.
plot_conf_max (bool, str) – can be map or error to visualize the matshow of prediction and errors
crossvalp_kws (dict) – crossvalpredict additional keywords arguments
conf_mx_kws (dict) – Additional confusion matrix keywords arguments.
Examples
>>> from sklearn.svm import SVC >>> from watex.utils.metrics import Metrics >>> from watex.datasets import fetch_data >>> X,y = fetch_data('Bagoue dataset prepared') >>> svc_clf = SVC(C=100, gamma=1e-2, kernel='rbf', ... random_state =42) >>> confObj =confusion_matrix_(svc_clf,X=X,y=y, ... plot_conf_max='error') >>> confObj.norm_conf_mx >>> confObj.conf_mx >>> confObj.__dict__.keys()
- watex.metrics.get_eval_scores(model, Xt, yt, *, multi_class='raise', average='binary', normalize=True, sample_weight=None, verbose=False, **scorer_kws)[source]#
Compute the accuracy, precision, recall and AUC scores.
- Parameters:
model (callable, always as a function,) –
A model estimator. An object which manages the estimation and decoding of a model. The model is estimated as a deterministic function of:
parameters provided in object construction or with set_params;
- the global numpy.random random state if the estimator’s random_state
parameter is set to None; and
- any data or sample properties passed to the most recent call to fit,
fit_transform or fit_predict, or data similarly passed in a sequence of calls to partial_fit.
The estimated model is stored in public and private attributes on the estimator instance, facilitating decoding through prediction and transformation methods. Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator. The core functionality of some estimators may also be available as a
function.Xt (Ndarray ( M x N matrix where
M=m-samples, &N=n-features)) – Shorthand for “test set”; data that is observed at testing and prediction time, used as independent variables in learning.The notation is uppercase to denote that it is ordinarily a matrix.yt (array-like, shape (M, )
M=m-samples,) – test target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.average ({'micro', 'macro', 'samples', 'weighted', 'binary'} or None, default='binary') –
This parameter is required for multiclass/multilabel targets. If
None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:'binary':Only report results for the class specified by
pos_label. This is applicable only if targets (y_{true,pred}) are binary.'micro':Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro':Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'weighted':Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. Weighted recall is equal to accuracy.
'samples':Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from
accuracy_score()). Will be ignored wheny_trueis binary. Note: multiclass ROC AUC currently only handles the ‘macro’ and ‘weighted’ averages.
multi_class ({'raise', 'ovr', 'ovo'}, default='raise') –
Only used for multiclass targets. Determines the type of configuration to use. The default value raises an error, so either
'ovr'or'ovo'must be passed explicitly.'ovr':Stands for One-vs-rest. Computes the AUC of each class against the rest [1] [2]. This treats the multiclass case in the same way as the multilabel case. Sensitive to class imbalance even when
average == 'macro', because class imbalance affects the composition of each of the ‘rest’ groupings.'ovo':Stands for One-vs-one. Computes the average AUC of all possible pairwise combinations of classes [3]. Insensitive to class imbalance when
average == 'macro'.
normalize (bool, default=True) – If
False, return the number of correctly classified samples. Otherwise, return the fraction of correctly classified samples.sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
verbose (int, default is
0) – Control the level of verbosity. Higher value lead to more messages.scorer_kws (dict,) – Additional keyword arguments passed to the scorer metrics:
accuracy_score(),precision_score(),recall_score(),roc_auc_score()
- Returns:
scores – A dictionnary to retain all the scores from metrics evaluation such as - accuracy , - recall - precision - ROC AUC ( Receiving Operating Characteric Area Under the Curve)
- Return type:
dict ,
Notes
Note that if yt is given, it computes y_score known as array-like of shape (n_samples,) or (n_samples, n_classes)Target scores following the scheme below:
In the binary case, it corresponds to an array of shape (n_samples,). Both probability estimates and non-thresholded decision values can be provided. The probability estimates correspond to the probability of the class with the greater label, i.e. estimator.classes_[1] and thus estimator.predict_proba(X, y)[:, 1]. The decision values corresponds to the output of estimator.decision_function(X, y). See more information in the User guide;
In the multiclass case, it corresponds to an array of shape (n_samples, n_classes) of probability estimates provided by the predict_proba method. The probability estimates must sum to 1 across the possible classes. In addition, the order of the class scores must correspond to the order of
labels, if provided, or else to the numerical or lexicographical order of the labels iny_true. See more information in the User guide;In the multilabel case, it corresponds to an array of shape (n_samples, n_classes). Probability estimates are provided by the predict_proba method and the non-thresholded decision values by the decision_function method. The probability estimates correspond to the probability of the class with the greater label for each output of the classifier. See more information in the User guide.
References
[1]Provost, F., Domingos, P. (2000). Well-trained PETs: Improving probability estimation trees (Section 6.2), CeDER Working Paper #IS-00-04, Stern School of Business, New York University.
See also
average_precision_scoreArea under the precision-recall curve.
roc_curveCompute Receiver operating characteristic (ROC) curve.
RocCurveDisplay.from_estimatorPlot Receiver Operating Characteristic (ROC) curve given an estimator and some data.
RocCurveDisplay.from_predictionsPlot Receiver Operating Characteristic (ROC) curve given the true and predicted values.
Examples
Binary case:
>>> from sklearn.datasets import load_breast_cancer >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.metrics import roc_auc_score >>> X, y = load_breast_cancer(return_X_y=True) >>> clf = LogisticRegression(solver="liblinear", random_state=0).fit(X, y) >>> roc_auc_score(y, clf.predict_proba(X)[:, 1]) 0.99... >>> roc_auc_score(y, clf.decision_function(X)) 0.99...
Multiclass case:
>>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> clf = LogisticRegression(solver="liblinear").fit(X, y) >>> roc_auc_score(y, clf.predict_proba(X), multi_class='ovr') 0.99...
Multilabel case:
>>> import numpy as np >>> from sklearn.datasets import make_multilabel_classification >>> from sklearn.multioutput import MultiOutputClassifier >>> X, y = make_multilabel_classification(random_state=0) >>> clf = MultiOutputClassifier(clf).fit(X, y) >>> # get a list of n_output containing probability arrays of shape >>> # (n_samples, n_classes) >>> y_pred = clf.predict_proba(X) >>> # extract the positive columns for each output >>> y_pred = np.transpose([pred[:, 1] for pred in y_pred]) >>> roc_auc_score(y, y_pred, average=None) array([0.82..., 0.86..., 0.94..., 0.85... , 0.94...]) >>> from sklearn.linear_model import RidgeClassifierCV >>> clf = RidgeClassifierCV().fit(X, y) >>> roc_auc_score(y, clf.decision_function(X), average=None) array([0.81..., 0.84... , 0.93..., 0.87..., 0.94...])
- watex.metrics.get_metrics()[source]#
Get the list of available metrics.
Metrics are measures of quantitative assessment commonly used for assessing, comparing, and tracking performance or production. Generally, a group of metrics will typically be used to build a dashboard that management or analysts review on a regular basis to maintain performance assessments, opinions, and business strategies.
- watex.metrics.precision_recall_tradeoff(clf, X, y, *, cv=7, label=None, method=None, cvp_kws=None, tradeoff=None, **prt_kws)[source]#
Precision-recall Tradeoff computes a score based on the decision function.
Is assign the instance to the positive class if that score on the left is greater than the threshold else it assigns to negative class.
- Parameters:
clf (callable, always as a function, classifier estimator) –
A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:
* fit * predict * score
It may also be appropriate to implement decision_function, predict_proba and predict_log_proba.
X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.
Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
cv (float,) –
A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:
* An integer, specifying the number of folds in K-fold cross validation. K-fold will be stratified over classes if the estimator is a classifier (determined by base.is_classifier) and the targets may represent a binary or multiclass (but not multioutput) classification problem (determined by utils.multiclass.type_of_target). * A cross-validation splitter instance. Refer to the User Guide for splitters available within `Scikit-learn`_ * An iterable yielding train/test splits.- With some exceptions (especially where not using cross validation at all
is an option), the default is
4-fold.
label (float, int) – Specific class to evaluate the tradeoff of precision and recall. If y is already a binary classifer, classe_ does need to specify.
method (str) – Method to get scores from each instance in the trainset. Ciuld be
decison_funcionorpredict_probaso Scikit-Learn classifier generally have one of the method. Default isdecision_function.tradeoff (float, optional,) – check your precision score and recall score with a specific tradeoff. Suppose to get a precision of 90%, you might specify a tradeoff and get the precision score and recall score by setting a y-tradeoff value.
Notes
Contreverse to the confusion matrix, a precision-recall tradeoff is very interesting metric to get the accuracy of the positive prediction named
precisonof the classifier with equation is:\[precision = TP/(TP+FP)\]where
TPis the True Positive andFPis the False Positive A trival way to have perfect precision is to make one single positive precision (precision = 1/1 =100%). This would be usefull since the calssifier would ignore all but one positive instance. So precision is typically used along another metric named recall,also sensitivity or true positive rate(TPR):This is the ratio of
positive instances that are corectly detected by the classifier. Equation of`recall` is given as:
\[recall = TP/(TP+FN)\]where
FNis of couse the number of False Negatives. It’s often convenient to combine preicion`and `recall metrics into a single metric call the F1 score, in particular if you need a simple way to compared two classifiers. The F1 score is the harmonic mean of the precision and recall. Whereas the regular mean treats all values equaly, the harmony mean gives much more weight to low values. As a result, the classifier will only get the F1 score if both recalll and preccion are high. The equation is given below:\[F1 &= 2/((1/precision)+(1/recall))= 2* precision*recall /(precision+recall) \ &= TP/(TP+ (FN +FP)/2)\]The way to increase the precion and reduce the recall and vice versa is called preicionrecall tradeoff.
- Returns:
obj – The metric object is composed of the following attributes:
confusion_matrix
f1_score
precision_score
recall_score
precisions from precision_recall_curve
recalls from precision_recall_curve
thresholds from precision_recall_curve
y classified
and can be retrieved for plot purpose.
- Return type:
object, an instancied metric tying object
Examples
>>> from watex.exlib import SGDClassifier >>> from watex.metrics import precision_recall_tradeoff >>> from watex.datasets import fetch_data >>> X, y= fetch_data('Bagoue prepared') >>> sgd_clf = SGDClassifier() >>> mObj = precision_recall_tradeoff (clf = sgd_clf, X= X, y = y, classe_=1, cv=3 , y_tradeoff=0.90) >>> mObj.confusion_matrix