watex.view.EvalPlot#

class watex.view.EvalPlot(tname=None, encode_labels=False, scale=None, cv=None, objective=None, prefix=None, label_values=None, litteral_classes=None, **kws)[source]#

Metrics, dimensionality and model evaluatation plots.

Inherited from BasePlot. Dimensional reduction and metric plots. The class works only with numerical features.

Discouraged

Contineous target values for plotting classification metrics is discouraged. However, We encourage user to prepare its dataset before using the EvalPlot methods. This is recommended to have full control of the expected results. Indeed, the most metrics plot implemented here works with supervised methods especially deals with the classification problems. So, the convenient way is for users to discretize/categorize (class labels) before the fit. If not the case, as the examples of demonstration under each method implementation, we first need to categorize the continue labels. The choice is twofolds: either providing individual class label as a list of integers using the method EvalPlot._cat_codes_y() or by specifying the number of clusters that the target must hold. Commonly the latter choice is usefull for a test or academic purpose. In practice into a real dataset, it is discouraged to use this kind of target partition since, it is far away of the reality and will yield unexpected misinterpretation.

Parameters:
  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • tname (str,) – A target name or label. In supervised learning the target name is considered as the reference name of y or label variable.

  • objective (str, default=None,) – The purpose of dataset; what probem do we intend to solve ? Originally the package was designed for flow rate prediction. Thus, if the objective is set to flow, plot will behave like the flow rate prediction purpose and in that case, some condition of target values need to be fullfilled. Furthermore, if the objective is set to flow, label_values` as well as the litteral_classes parameters need to be supplied to right encode the target according to the hydraulic system requirement during the campaign for drinking water supply. For any other purpose for the dataset, keep the objective to None. Default is None.

  • encode_labels (bool, default=False,) –

    label encoding works with label_values parameter. If the y is a continous numerical values, we could turn the regression to classification by setting encode_labels to True. if value is set to True and values of labels is not given, an unique identifier is created which can not fit the exact needs of the users. So it is recommended to set this parameters in combinaison with the`label_values`. For instance:

    encode_labels=True ; label_values =3
    

    indicates that the target y values should be categorized to hold the integer identifier equals to [0 , 1, 2]. y are splitted into three subsets where:

    classes (c) = [ c{0} <= y. min(), y.min() < c {1}< y.max(),
                     >=y.max {2}]
    

    This auto-splitting could not fit the exact classification of the target so it is recommended to set the label_values as a list of class labels. For instance label_values=[0 , 1, 2] and else.

  • scale (str, ['StandardScaler'|'MinMaxScaler'], default ='StandardScaler') – kind of feature scaling to apply on numerical features. Note that when using PCA, it is recommended to turn scale to True and fit_transform rather than only fit the method. Note that transform method also handle the missing nan value in the data where the default strategy for filling is most_frequent.

  • cv (float,) –

    A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:

    * An integer, specifying the number of folds in K-fold cross validation.
        K-fold will be stratified over classes if the estimator is a classifier
        (determined by base.is_classifier) and the targets may represent a
        binary or multiclass (but not multioutput) classification problem
        (determined by utils.multiclass.type_of_target).
    * A cross-validation splitter instance. Refer to the User Guide for
        splitters available within `Scikit-learn`_
    * An iterable yielding train/test splits.
    
    With some exceptions (especially where not using cross validation at all

    is an option), the default is 4-fold.

  • prefix (str, optional) – litteral string to prefix the integer identical labels.

  • label_values (list of int, optional) – works with encode_labels parameters. It indicates the different class labels. Refer to explanation of encode_labels.

  • Litteral_classes (list or str, optional) –

    Works when objective is flow. Replace class integer names by its litteral strings. For instance:

    label_values =[0, 1, 3, 6]
    Litteral_classes = ['rate0', 'rate1', 'rate2', 'rate3']
    

  • yp_ls (str, default='-',) – Line style of Predicted label. Can be [ ‘-’ | ‘.’ | ‘:’ ]

  • yp_lw (str, default= 3) – Line weight of the Predicted plot

  • yp_lc (str or matplotlib.cm(), default= ‘k’) – Line color of the Prediction plot. default is k

  • rs (str, default='--') – Line style of Recall metric

  • ps (str, default='-') – Line style of `Precision `metric

  • rc (str, default=(.6,.6,.6)) – Recall metric colors

  • pc (str or matplotlib.cm(), default=’k’) – Precision colors from Matplotlib colormaps.

  • yp_marker (str or matplotlib.markers(), default =’o’) – Style of marker in of Prediction points.

  • yp_markerfacecolor (str or matplotlib.cm(), default=’k’) – Facecolor of the Predicted label marker.

  • yp_markeredgecolor (stror matplotlib.cm(), default= ‘r’) – Edgecolor of the Predicted label marker.

  • yp_markeredgewidth (int, default=2) – Width of the `Predicted`label marker.

  • savefig (str, Path-like object,) – savefigure’s name, default is None

  • fig_dpi (float,) – dots-per-inch resolution of the figure. default is 300

  • fig_num (int,) – size of figure in inches (width, height). default is [5, 5]

  • fig_size (Tuple (int, int) or inch) – size of figure in inches (width, height).*default* is [5, 5]

  • fig_orientation (str,) – figure orientation. default is landscape

  • fig_tile (str,) – figure title. default is None

  • fs (float,) – size of font of axis tick labels, axis labels are fs+2. default is 6

  • ls (str,) – line style, it can be [ ‘-’ | ‘.’ | ‘:’ ] . default is ‘-’

  • lc (str, Optional,) – line color of the plot, default is k

  • lw (float, Optional,) – line weight of the plot, default is 1.5

  • alpha (float between 0 < alpha < 1,) – transparency number, default is 0.5,

  • font_weight (str, Optional) – weight of the font , default is bold.

  • font_style (str, Optional) – style of the font. default is italic

  • font_size (float, Optional) – size of font in inches (width, height). default is 3.

  • ms (float, Optional) – size of marker in points. default is 5

  • marker (str, Optional) – marker of stations default is o.

  • marker_style (str, Optional) – facecolor of the marker. default is yellow

  • marker_edgecolor (str, Optional) – facecolor of the marker. default is yellow

  • marker_edgewidth (float, Optional) – width of the marker. default is 3.

  • xminorticks (float, Optional) – minortick according to x-axis size and default is 1.

  • yminorticks (float, Optional) – yminorticks according to x-axis size and default is 1.

  • bins (histograms element separation between two bar. default is 10.)

  • xlim (tuple (int, int), Optional) – limit of x-axis in plot.

  • ylim (tuple (int, int), Optional) – limit of x-axis in plot.

  • xlabel (str, Optional,) – label name of x-axis in plot.

  • ylabel (str, Optional,) – label name of y-axis in plot.

  • rotate_xlabel (float, Optional) – angle to rotate xlabel in plot.

  • rotate_ylabel (float, Optional) – angle to rotate ylabel in plot.

  • leg_kws (dict, Optional) – keyword arguments of legend. default is empty dict

  • plt_kws (dict, Optional) – keyword arguments of plot. default is empty dict

  • glc (str, Optional) – line color of the grid plot, default is k

  • glw (float, Optional) – line weight of the grid plot, default is 2

  • galpha (float, Optional,) – transparency number of grid, default is 0.5

  • gaxis (str ('x', 'y', 'both')) – type of axis to hold the grid, default is both

  • gwhich (str, Optional) – kind of grid in the plot. default is major

  • tp_axis (bool,) – axis to apply the ticks params. default is both

  • tp_labelsize (str, Optional) – labelsize of ticks params. default is italic

  • tp_bottom (bool,) – position at bottom of ticks params. default is True.

  • tp_labelbottom (bool,) – put label on the bottom of the ticks. default is False

  • tp_labeltop (bool,) – put label on the top of the ticks. default is True

  • cb_orientation (str , ('vertical', 'horizontal')) – orientation of the colorbar, default is vertical

  • cb_aspect (float, Optional) – aspect of the colorbar. default is 20.

  • cb_shrink (float, Optional) – shrink size of the colorbar. default is 1.0

  • cb_pad (float,) – pad of the colorbar of plot. default is .05

  • cb_anchor (tuple (float, float)) – anchor of the colorbar. default is (0.0, 0.5)

  • cb_panchor (tuple (float, float)) – proportionality anchor of the colorbar. default is (1.0, 0.5)

  • cb_label (str, Optional) – label of the colorbar.

  • cb_spacing (str, Optional) – spacing of the colorbar. default is uniform

  • cb_drawedges (bool,) – draw edges inside of the colorbar. default is False

Notes

This module works with numerical data i.e if the data must contains the numerical features only. If categorical values are included in the dataset, they should be removed and the size of the data should be chunked during the fit methods.