watex.metrics.confusion_matrix#

watex.metrics.confusion_matrix(clf, X, y, *, cv=7, plot_conf_max=False, crossvalp_kws={}, **conf_mx_kws)[source]#

Evaluate the preformance of the model or classifier by counting the number of the times instances of class A are classified in class B.

To compute a confusion matrix, you need first to have a set of prediction, so they can be compared to the actual targets. You could make a prediction using the test set, but it’s better to keep it untouch since you are not ready to make your final prediction. Remember that we use the test set only at very end of the project, once you have a classifier that you are ready to lauchn instead. The confusion metric give a lot of information but sometimes we may prefer a more concise metric like precision and recall metrics.

Parameters
  • clf (callable, always as a function, classifier estimator) –

    A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:

    * fit
    * predict
    * score
    

    It may also be appropriate to implement decision_function, predict_proba and predict_log_proba.

  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • cv (float,) –

    A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:

    * An integer, specifying the number of folds in K-fold cross validation.
        K-fold will be stratified over classes if the estimator is a classifier
        (determined by base.is_classifier) and the targets may represent a
        binary or multiclass (but not multioutput) classification problem
        (determined by utils.multiclass.type_of_target).
    * A cross-validation splitter instance. Refer to the User Guide for
        splitters available within `Scikit-learn`_
    * An iterable yielding train/test splits.
    
    With some exceptions (especially where not using cross validation at all

    is an option), the default is 4-fold.

  • label (float, int) – Specific class to evaluate the tradeoff of precision and recall. If y is already a binary classifer (0 & 1), label does need to specify.

  • method (str) – Method to get scores from each instance in the trainset. Could be a decison_funcion or predict_proba. When using the scikit-Learn classifier, it generally has one of the method. Default is decision_function.

  • tradeoff (float) – check your precision score and recall score with a specific tradeoff. Suppose to get a precision of 90%, you might specify a tradeoff and get the precision score and recall score by setting a y-tradeoff value.

  • plot_conf_max (bool, str) – can be map or error to visualize the matshow of prediction and errors

  • crossvalp_kws (dict) – crossvalpredict additional keywords arguments

  • conf_mx_kws (dict) – Additional confusion matrix keywords arguments.

Examples

>>> from sklearn.svm import SVC
>>> from watex.utils.metrics import Metrics
>>> from watex.datasets import fetch_data
>>> X,y = fetch_data('Bagoue dataset prepared')
>>> svc_clf = SVC(C=100, gamma=1e-2, kernel='rbf',
...              random_state =42)
>>> confObj =confusion_matrix_(svc_clf,X=X,y=y,
...                        plot_conf_max='error')
>>> confObj.norm_conf_mx
>>> confObj.conf_mx
>>> confObj.__dict__.keys()