watex.metrics.precision_recall_tradeoff#

watex.metrics.precision_recall_tradeoff(clf, X, y, *, cv=7, label=None, method=None, cvp_kws=None, tradeoff=None, **prt_kws)[source]#

Precision-recall Tradeoff computes a score based on the decision function.

Is assign the instance to the positive class if that score on the left is greater than the threshold else it assigns to negative class.

Parameters:
  • clf (callable, always as a function, classifier estimator) –

    A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:

    * fit
    * predict
    * score
    

    It may also be appropriate to implement decision_function, predict_proba and predict_log_proba.

  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • cv (float,) –

    A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:

    * An integer, specifying the number of folds in K-fold cross validation.
        K-fold will be stratified over classes if the estimator is a classifier
        (determined by base.is_classifier) and the targets may represent a
        binary or multiclass (but not multioutput) classification problem
        (determined by utils.multiclass.type_of_target).
    * A cross-validation splitter instance. Refer to the User Guide for
        splitters available within `Scikit-learn`_
    * An iterable yielding train/test splits.
    
    With some exceptions (especially where not using cross validation at all

    is an option), the default is 4-fold.

  • label (float, int) – Specific class to evaluate the tradeoff of precision and recall. If y is already a binary classifer, classe_ does need to specify.

  • method (str) – Method to get scores from each instance in the trainset. Ciuld be decison_funcion or predict_proba so Scikit-Learn classifier generally have one of the method. Default is decision_function.

  • tradeoff (float, optional,) – check your precision score and recall score with a specific tradeoff. Suppose to get a precision of 90%, you might specify a tradeoff and get the precision score and recall score by setting a y-tradeoff value.

Notes

Contreverse to the confusion matrix, a precision-recall tradeoff is very interesting metric to get the accuracy of the positive prediction named precison of the classifier with equation is:

\[precision = TP/(TP+FP)\]

where TP is the True Positive and FP is the False Positive A trival way to have perfect precision is to make one single positive precision (precision = 1/1 =100%). This would be usefull since the calssifier would ignore all but one positive instance. So precision is typically used along another metric named recall,

also sensitivity or true positive rate(TPR):This is the ratio of

positive instances that are corectly detected by the classifier. Equation of`recall` is given as:

\[recall = TP/(TP+FN)\]

where FN is of couse the number of False Negatives. It’s often convenient to combine preicion`and `recall metrics into a single metric call the F1 score, in particular if you need a simple way to compared two classifiers. The F1 score is the harmonic mean of the precision and recall. Whereas the regular mean treats all values equaly, the harmony mean gives much more weight to low values. As a result, the classifier will only get the F1 score if both recalll and preccion are high. The equation is given below:

\[F1 &= 2/((1/precision)+(1/recall))= 2* precision*recall /(precision+recall) \ &= TP/(TP+ (FN +FP)/2)\]

The way to increase the precion and reduce the recall and vice versa is called preicionrecall tradeoff.

Returns:

obj – The metric object is composed of the following attributes:

  • confusion_matrix

  • f1_score

  • precision_score

  • recall_score

  • precisions from precision_recall_curve

  • recalls from precision_recall_curve

  • thresholds from precision_recall_curve

  • y classified

and can be retrieved for plot purpose.

Return type:

object, an instancied metric tying object

Examples

>>> from watex.exlib import SGDClassifier
>>> from watex.metrics import precision_recall_tradeoff
>>> from watex.datasets import fetch_data
>>> X, y= fetch_data('Bagoue prepared')
>>> sgd_clf = SGDClassifier()
>>> mObj = precision_recall_tradeoff (clf = sgd_clf, X= X, y = y,
                                classe_=1, cv=3 , y_tradeoff=0.90)
>>> mObj.confusion_matrix