Metrics are measures of quantitative assessment commonly used for estimating, comparing, and tracking performance or production. Generally, a group of metrics will typically be used to build a dashboard that management or analysts review on a regular basis to maintain performance assessments, opinions, and business strategies.

watex.metrics.ROC_curve(roc_kws=None, **tradeoff_kws)[source]#

The Receiving Operating Characteric (ROC) curve is another common tool used with binary classifiers.

It’s very similar to precision/recall , but instead of plotting precision versus recall, the ROC curve plots the true positive rate (TNR)another name for recall) against the false positive rate`(FPR). The FPR is the ratio of negative instances that are correctly classified as positive.It is equal to one minus the TNR, which is the ratio of negative isinstance that are correctly classified as negative. The TNR is also called `specify. Hence the ROC curve plot sensitivity (recall) versus 1-specifity.

Parameters
  • clf (callable, always as a function, classifier estimator) –

    A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:

    * fit
    * predict
    * score
    

    It may also be appropriate to implement decision_function, predict_proba and predict_log_proba.

  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • cv (float,) –

    A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:

    * An integer, specifying the number of folds in K-fold cross validation.
        K-fold will be stratified over classes if the estimator is a classifier
        (determined by base.is_classifier) and the targets may represent a
        binary or multiclass (but not multioutput) classification problem
        (determined by utils.multiclass.type_of_target).
    * A cross-validation splitter instance. Refer to the User Guide for
        splitters available within `Scikit-learn`_
    * An iterable yielding train/test splits.
    
    With some exceptions (especially where not using cross validation at all

    is an option), the default is 4-fold.

  • label (float, int) – Specific class to evaluate the tradeoff of precision and recall. If y is already a binary classifer (0 & 1), label does need to specify.

  • method (str) – Method to get scores from each instance in the trainset. Could be a decison_funcion or predict_proba. When using the scikit-Learn classifier, it generally has one of the method. Default is decision_function.

  • tradeoff (float) – check your precision score and recall score with a specific tradeoff. Suppose to get a precision of 90%, you might specify a tradeoff and get the precision score and recall score by setting a y-tradeoff value.

  • roc_kws (dict) – roc_curve additional keywords arguments

See also

watex.view.mlplot.MLPlot.precisionRecallTradeoff

plot consistency precision recall curve.

Returns

obj – The metric object hold the following attributes additional to the return attributes from :func:~.precision_recall_tradeoff`:

* `roc_auc_score` for area under the curve
* `fpr` for false positive rate
* `tpr` for true positive rate
* `thresholds` from `roc_curve`
* `y` classified

and can be retrieved for plot purpose.

Return type

object, an instancied metric tying object

Examples

>>> from watex.exlib import SGDClassifier
>>> from watex.metrics import ROC_curve
>>> from watex.datasets import fetch_data
>>> X, y= fetch_data('Bagoue prepared')
>>> rocObj =ROC_curve(clf = sgd_clf,  X= X,
               y = y, classe_=1, cv=3 )
>>> rocObj.__dict__.keys()
>>> rocObj.roc_auc_score
>>> rocObj.fpr
watex.metrics.confusion_matrix(clf, X, y, *, cv=7, plot_conf_max=False, crossvalp_kws={}, **conf_mx_kws)[source]#

Evaluate the preformance of the model or classifier by counting the number of the times instances of class A are classified in class B.

To compute a confusion matrix, you need first to have a set of prediction, so they can be compared to the actual targets. You could make a prediction using the test set, but it’s better to keep it untouch since you are not ready to make your final prediction. Remember that we use the test set only at very end of the project, once you have a classifier that you are ready to lauchn instead. The confusion metric give a lot of information but sometimes we may prefer a more concise metric like precision and recall metrics.

Parameters
  • clf (callable, always as a function, classifier estimator) –

    A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:

    * fit
    * predict
    * score
    

    It may also be appropriate to implement decision_function, predict_proba and predict_log_proba.

  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • cv (float,) –

    A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:

    * An integer, specifying the number of folds in K-fold cross validation.
        K-fold will be stratified over classes if the estimator is a classifier
        (determined by base.is_classifier) and the targets may represent a
        binary or multiclass (but not multioutput) classification problem
        (determined by utils.multiclass.type_of_target).
    * A cross-validation splitter instance. Refer to the User Guide for
        splitters available within `Scikit-learn`_
    * An iterable yielding train/test splits.
    
    With some exceptions (especially where not using cross validation at all

    is an option), the default is 4-fold.

  • label (float, int) – Specific class to evaluate the tradeoff of precision and recall. If y is already a binary classifer (0 & 1), label does need to specify.

  • method (str) – Method to get scores from each instance in the trainset. Could be a decison_funcion or predict_proba. When using the scikit-Learn classifier, it generally has one of the method. Default is decision_function.

  • tradeoff (float) – check your precision score and recall score with a specific tradeoff. Suppose to get a precision of 90%, you might specify a tradeoff and get the precision score and recall score by setting a y-tradeoff value.

  • plot_conf_max (bool, str) – can be map or error to visualize the matshow of prediction and errors

  • crossvalp_kws (dict) – crossvalpredict additional keywords arguments

  • conf_mx_kws (dict) – Additional confusion matrix keywords arguments.

Examples

>>> from sklearn.svm import SVC
>>> from watex.utils.metrics import Metrics
>>> from watex.datasets import fetch_data
>>> X,y = fetch_data('Bagoue dataset prepared')
>>> svc_clf = SVC(C=100, gamma=1e-2, kernel='rbf',
...              random_state =42)
>>> confObj =confusion_matrix_(svc_clf,X=X,y=y,
...                        plot_conf_max='error')
>>> confObj.norm_conf_mx
>>> confObj.conf_mx
>>> confObj.__dict__.keys()
watex.metrics.get_eval_scores(model, Xt, yt, *, multi_class='raise', average='binary', normalize=True, sample_weight=None, verbose=False, **scorer_kws)[source]#

Compute the accuracy, precision, recall and AUC scores.

Parameters
  • model (callable, always as a function,) –

    A model estimator. An object which manages the estimation and decoding of a model. The model is estimated as a deterministic function of:

    • parameters provided in object construction or with set_params;

    • the global numpy.random random state if the estimator’s random_state

      parameter is set to None; and

    • any data or sample properties passed to the most recent call to fit,

      fit_transform or fit_predict, or data similarly passed in a sequence of calls to partial_fit.

    The estimated model is stored in public and private attributes on the estimator instance, facilitating decoding through prediction and transformation methods. Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator. The core functionality of some estimators may also be available as a function.

  • Xt (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Shorthand for “test set”; data that is observed at testing and prediction time, used as independent variables in learning.The notation is uppercase to denote that it is ordinarily a matrix.

  • yt (array-like, shape (M, ) M=m-samples,) – test target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • average ({'micro', 'macro', 'samples', 'weighted', 'binary'} or None, default='binary') –

    This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

    'binary':

    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.

    'micro':

    Calculate metrics globally by counting the total true positives, false negatives and false positives.

    'macro':

    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

    'weighted':

    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. Weighted recall is equal to accuracy.

    'samples':

    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score()). Will be ignored when y_true is binary. Note: multiclass ROC AUC currently only handles the ‘macro’ and ‘weighted’ averages.

  • multi_class ({'raise', 'ovr', 'ovo'}, default='raise') –

    Only used for multiclass targets. Determines the type of configuration to use. The default value raises an error, so either 'ovr' or 'ovo' must be passed explicitly.

    'ovr':

    Stands for One-vs-rest. Computes the AUC of each class against the rest [1] [2]. This treats the multiclass case in the same way as the multilabel case. Sensitive to class imbalance even when average == 'macro', because class imbalance affects the composition of each of the ‘rest’ groupings.

    'ovo':

    Stands for One-vs-one. Computes the average AUC of all possible pairwise combinations of classes [3]. Insensitive to class imbalance when average == 'macro'.

  • normalize (bool, default=True) – If False, return the number of correctly classified samples. Otherwise, return the fraction of correctly classified samples.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

  • verbose (int, default is 0) – Control the level of verbosity. Higher value lead to more messages.

  • scorer_kws (dict,) – Additional keyword arguments passed to the scorer metrics: accuracy_score(), precision_score(), recall_score(), roc_auc_score()

Returns

scores – A dictionnary to retain all the scores from metrics evaluation such as - accuracy , - recall - precision - ROC AUC ( Receiving Operating Characteric Area Under the Curve)

Return type

dict ,

Notes

Note that if yt is given, it computes y_score known as array-like of shape (n_samples,) or (n_samples, n_classes)Target scores following the scheme below:

  • In the binary case, it corresponds to an array of shape (n_samples,). Both probability estimates and non-thresholded decision values can be provided. The probability estimates correspond to the probability of the class with the greater label, i.e. estimator.classes_[1] and thus estimator.predict_proba(X, y)[:, 1]. The decision values corresponds to the output of estimator.decision_function(X, y). See more information in the User guide;

  • In the multiclass case, it corresponds to an array of shape (n_samples, n_classes) of probability estimates provided by the predict_proba method. The probability estimates must sum to 1 across the possible classes. In addition, the order of the class scores must correspond to the order of labels, if provided, or else to the numerical or lexicographical order of the labels in y_true. See more information in the User guide;

  • In the multilabel case, it corresponds to an array of shape (n_samples, n_classes). Probability estimates are provided by the predict_proba method and the non-thresholded decision values by the decision_function method. The probability estimates correspond to the probability of the class with the greater label for each output of the classifier. See more information in the User guide.

References

1

Provost, F., Domingos, P. (2000). Well-trained PETs: Improving probability estimation trees (Section 6.2), CeDER Working Paper #IS-00-04, Stern School of Business, New York University.

2

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.

3

Hand, D.J., Till, R.J. (2001). A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning, 45(2), 171-186.

See also

average_precision_score

Area under the precision-recall curve.

roc_curve

Compute Receiver operating characteristic (ROC) curve.

RocCurveDisplay.from_estimator

Plot Receiver Operating Characteristic (ROC) curve given an estimator and some data.

RocCurveDisplay.from_predictions

Plot Receiver Operating Characteristic (ROC) curve given the true and predicted values.

Examples

Binary case:

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.metrics import roc_auc_score
>>> X, y = load_breast_cancer(return_X_y=True)
>>> clf = LogisticRegression(solver="liblinear", random_state=0).fit(X, y)
>>> roc_auc_score(y, clf.predict_proba(X)[:, 1])
0.99...
>>> roc_auc_score(y, clf.decision_function(X))
0.99...

Multiclass case:

>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> clf = LogisticRegression(solver="liblinear").fit(X, y)
>>> roc_auc_score(y, clf.predict_proba(X), multi_class='ovr')
0.99...

Multilabel case:

>>> import numpy as np
>>> from sklearn.datasets import make_multilabel_classification
>>> from sklearn.multioutput import MultiOutputClassifier
>>> X, y = make_multilabel_classification(random_state=0)
>>> clf = MultiOutputClassifier(clf).fit(X, y)
>>> # get a list of n_output containing probability arrays of shape
>>> # (n_samples, n_classes)
>>> y_pred = clf.predict_proba(X)
>>> # extract the positive columns for each output
>>> y_pred = np.transpose([pred[:, 1] for pred in y_pred])
>>> roc_auc_score(y, y_pred, average=None)
array([0.82..., 0.86..., 0.94..., 0.85... , 0.94...])
>>> from sklearn.linear_model import RidgeClassifierCV
>>> clf = RidgeClassifierCV().fit(X, y)
>>> roc_auc_score(y, clf.decision_function(X), average=None)
array([0.81..., 0.84... , 0.93..., 0.87..., 0.94...])
watex.metrics.get_metrics()[source]#

Get the list of available metrics.

Metrics are measures of quantitative assessment commonly used for assessing, comparing, and tracking performance or production. Generally, a group of metrics will typically be used to build a dashboard that management or analysts review on a regular basis to maintain performance assessments, opinions, and business strategies.

watex.metrics.precision_recall_tradeoff(clf, X, y, *, cv=7, label=None, method=None, cvp_kws=None, tradeoff=None, **prt_kws)[source]#

Precision-recall Tradeoff computes a score based on the decision function.

Is assign the instance to the positive class if that score on the left is greater than the threshold else it assigns to negative class.

Parameters
  • clf (callable, always as a function, classifier estimator) –

    A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:

    * fit
    * predict
    * score
    

    It may also be appropriate to implement decision_function, predict_proba and predict_log_proba.

  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • cv (float,) –

    A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:

    * An integer, specifying the number of folds in K-fold cross validation.
        K-fold will be stratified over classes if the estimator is a classifier
        (determined by base.is_classifier) and the targets may represent a
        binary or multiclass (but not multioutput) classification problem
        (determined by utils.multiclass.type_of_target).
    * A cross-validation splitter instance. Refer to the User Guide for
        splitters available within `Scikit-learn`_
    * An iterable yielding train/test splits.
    
    With some exceptions (especially where not using cross validation at all

    is an option), the default is 4-fold.

  • label (float, int) – Specific class to evaluate the tradeoff of precision and recall. If y is already a binary classifer, classe_ does need to specify.

  • method (str) – Method to get scores from each instance in the trainset. Ciuld be decison_funcion or predict_proba so Scikit-Learn classifier generally have one of the method. Default is decision_function.

  • tradeoff (float, optional,) – check your precision score and recall score with a specific tradeoff. Suppose to get a precision of 90%, you might specify a tradeoff and get the precision score and recall score by setting a y-tradeoff value.

Notes

Contreverse to the confusion matrix, a precision-recall tradeoff is very interesting metric to get the accuracy of the positive prediction named precison of the classifier with equation is:

\[precision = TP/(TP+FP)\]

where TP is the True Positive and FP is the False Positive A trival way to have perfect precision is to make one single positive precision (precision = 1/1 =100%). This would be usefull since the calssifier would ignore all but one positive instance. So precision is typically used along another metric named recall,

also sensitivity or true positive rate(TPR):This is the ratio of

positive instances that are corectly detected by the classifier. Equation of`recall` is given as:

\[recall = TP/(TP+FN)\]

where FN is of couse the number of False Negatives. It’s often convenient to combine preicion`and `recall metrics into a single metric call the F1 score, in particular if you need a simple way to compared two classifiers. The F1 score is the harmonic mean of the precision and recall. Whereas the regular mean treats all values equaly, the harmony mean gives much more weight to low values. As a result, the classifier will only get the F1 score if both recalll and preccion are high. The equation is given below:

\[F1 &= 2/((1/precision)+(1/recall))= 2* precision*recall /(precision+recall) \ &= TP/(TP+ (FN +FP)/2)\]

The way to increase the precion and reduce the recall and vice versa is called preicionrecall tradeoff.

Returns

obj – The metric object is composed of the following attributes:

  • confusion_matrix

  • f1_score

  • precision_score

  • recall_score

  • precisions from precision_recall_curve

  • recalls from precision_recall_curve

  • thresholds from precision_recall_curve

  • y classified

and can be retrieved for plot purpose.

Return type

object, an instancied metric tying object

Examples

>>> from watex.exlib import SGDClassifier
>>> from watex.metrics import precision_recall_tradeoff
>>> from watex.datasets import fetch_data
>>> X, y= fetch_data('Bagoue prepared')
>>> sgd_clf = SGDClassifier()
>>> mObj = precision_recall_tradeoff (clf = sgd_clf, X= X, y = y,
                                classe_=1, cv=3 , y_tradeoff=0.90)
>>> mObj.confusion_matrix