watex.view package#

View is the visualization sub-package. It is divised into the learning plot (mlplot) and, data analysis and exploratory modules (plot).

class watex.view.EvalPlot(tname=None, encode_labels=False, scale=None, cv=None, objective=None, prefix=None, label_values=None, litteral_classes=None, **kws)[source]#

Bases: BasePlot

Metrics, dimensionality and model evaluatation plots.

Inherited from BasePlot. Dimensional reduction and metric plots. The class works only with numerical features.

Discouraged

Contineous target values for plotting classification metrics is discouraged. However, We encourage user to prepare its dataset before using the EvalPlot methods. This is recommended to have full control of the expected results. Indeed, the most metrics plot implemented here works with supervised methods especially deals with the classification problems. So, the convenient way is for users to discretize/categorize (class labels) before the fit. If not the case, as the examples of demonstration under each method implementation, we first need to categorize the continue labels. The choice is twofolds: either providing individual class label as a list of integers using the method EvalPlot._cat_codes_y() or by specifying the number of clusters that the target must hold. Commonly the latter choice is usefull for a test or academic purpose. In practice into a real dataset, it is discouraged to use this kind of target partition since, it is far away of the reality and will yield unexpected misinterpretation.

Parameters:
  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • tname (str,) – A target name or label. In supervised learning the target name is considered as the reference name of y or label variable.

  • objective (str, default=None,) – The purpose of dataset; what probem do we intend to solve ? Originally the package was designed for flow rate prediction. Thus, if the objective is set to flow, plot will behave like the flow rate prediction purpose and in that case, some condition of target values need to be fullfilled. Furthermore, if the objective is set to flow, label_values` as well as the litteral_classes parameters need to be supplied to right encode the target according to the hydraulic system requirement during the campaign for drinking water supply. For any other purpose for the dataset, keep the objective to None. Default is None.

  • encode_labels (bool, default=False,) –

    label encoding works with label_values parameter. If the y is a continous numerical values, we could turn the regression to classification by setting encode_labels to True. if value is set to True and values of labels is not given, an unique identifier is created which can not fit the exact needs of the users. So it is recommended to set this parameters in combinaison with the`label_values`. For instance:

    encode_labels=True ; label_values =3
    

    indicates that the target y values should be categorized to hold the integer identifier equals to [0 , 1, 2]. y are splitted into three subsets where:

    classes (c) = [ c{0} <= y. min(), y.min() < c {1}< y.max(),
                     >=y.max {2}]
    

    This auto-splitting could not fit the exact classification of the target so it is recommended to set the label_values as a list of class labels. For instance label_values=[0 , 1, 2] and else.

  • scale (str, ['StandardScaler'|'MinMaxScaler'], default ='StandardScaler') – kind of feature scaling to apply on numerical features. Note that when using PCA, it is recommended to turn scale to True and fit_transform rather than only fit the method. Note that transform method also handle the missing nan value in the data where the default strategy for filling is most_frequent.

  • cv (float,) –

    A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:

    * An integer, specifying the number of folds in K-fold cross validation.
        K-fold will be stratified over classes if the estimator is a classifier
        (determined by base.is_classifier) and the targets may represent a
        binary or multiclass (but not multioutput) classification problem
        (determined by utils.multiclass.type_of_target).
    * A cross-validation splitter instance. Refer to the User Guide for
        splitters available within `Scikit-learn`_
    * An iterable yielding train/test splits.
    
    With some exceptions (especially where not using cross validation at all

    is an option), the default is 4-fold.

  • prefix (str, optional) – litteral string to prefix the integer identical labels.

  • label_values (list of int, optional) – works with encode_labels parameters. It indicates the different class labels. Refer to explanation of encode_labels.

  • Litteral_classes (list or str, optional) –

    Works when objective is flow. Replace class integer names by its litteral strings. For instance:

    label_values =[0, 1, 3, 6]
    Litteral_classes = ['rate0', 'rate1', 'rate2', 'rate3']
    

  • yp_ls (str, default='-',) – Line style of Predicted label. Can be [ ‘-’ | ‘.’ | ‘:’ ]

  • yp_lw (str, default= 3) – Line weight of the Predicted plot

  • yp_lc (str or matplotlib.cm(), default= ‘k’) – Line color of the Prediction plot. default is k

  • rs (str, default='--') – Line style of Recall metric

  • ps (str, default='-') – Line style of `Precision `metric

  • rc (str, default=(.6,.6,.6)) – Recall metric colors

  • pc (str or matplotlib.cm(), default=’k’) – Precision colors from Matplotlib colormaps.

  • yp_marker (str or matplotlib.markers(), default =’o’) – Style of marker in of Prediction points.

  • yp_markerfacecolor (str or matplotlib.cm(), default=’k’) – Facecolor of the Predicted label marker.

  • yp_markeredgecolor (stror matplotlib.cm(), default= ‘r’) – Edgecolor of the Predicted label marker.

  • yp_markeredgewidth (int, default=2) – Width of the `Predicted`label marker.

  • savefig (str, Path-like object,) – savefigure’s name, default is None

  • fig_dpi (float,) – dots-per-inch resolution of the figure. default is 300

  • fig_num (int,) – size of figure in inches (width, height). default is [5, 5]

  • fig_size (Tuple (int, int) or inch) – size of figure in inches (width, height).*default* is [5, 5]

  • fig_orientation (str,) – figure orientation. default is landscape

  • fig_tile (str,) – figure title. default is None

  • fs (float,) – size of font of axis tick labels, axis labels are fs+2. default is 6

  • ls (str,) – line style, it can be [ ‘-’ | ‘.’ | ‘:’ ] . default is ‘-’

  • lc (str, Optional,) – line color of the plot, default is k

  • lw (float, Optional,) – line weight of the plot, default is 1.5

  • alpha (float between 0 < alpha < 1,) – transparency number, default is 0.5,

  • font_weight (str, Optional) – weight of the font , default is bold.

  • font_style (str, Optional) – style of the font. default is italic

  • font_size (float, Optional) – size of font in inches (width, height). default is 3.

  • ms (float, Optional) – size of marker in points. default is 5

  • marker (str, Optional) – marker of stations default is o.

  • marker_style (str, Optional) – facecolor of the marker. default is yellow

  • marker_edgecolor (str, Optional) – facecolor of the marker. default is yellow

  • marker_edgewidth (float, Optional) – width of the marker. default is 3.

  • xminorticks (float, Optional) – minortick according to x-axis size and default is 1.

  • yminorticks (float, Optional) – yminorticks according to x-axis size and default is 1.

  • bins (histograms element separation between two bar. default is 10.) –

  • xlim (tuple (int, int), Optional) – limit of x-axis in plot.

  • ylim (tuple (int, int), Optional) – limit of x-axis in plot.

  • xlabel (str, Optional,) – label name of x-axis in plot.

  • ylabel (str, Optional,) – label name of y-axis in plot.

  • rotate_xlabel (float, Optional) – angle to rotate xlabel in plot.

  • rotate_ylabel (float, Optional) – angle to rotate ylabel in plot.

  • leg_kws (dict, Optional) – keyword arguments of legend. default is empty dict

  • plt_kws (dict, Optional) – keyword arguments of plot. default is empty dict

  • glc (str, Optional) – line color of the grid plot, default is k

  • glw (float, Optional) – line weight of the grid plot, default is 2

  • galpha (float, Optional,) – transparency number of grid, default is 0.5

  • gaxis (str ('x', 'y', 'both')) – type of axis to hold the grid, default is both

  • gwhich (str, Optional) – kind of grid in the plot. default is major

  • tp_axis (bool,) – axis to apply the ticks params. default is both

  • tp_labelsize (str, Optional) – labelsize of ticks params. default is italic

  • tp_bottom (bool,) – position at bottom of ticks params. default is True.

  • tp_labelbottom (bool,) – put label on the bottom of the ticks. default is False

  • tp_labeltop (bool,) – put label on the top of the ticks. default is True

  • cb_orientation (str , ('vertical', 'horizontal')) – orientation of the colorbar, default is vertical

  • cb_aspect (float, Optional) – aspect of the colorbar. default is 20.

  • cb_shrink (float, Optional) – shrink size of the colorbar. default is 1.0

  • cb_pad (float,) – pad of the colorbar of plot. default is .05

  • cb_anchor (tuple (float, float)) – anchor of the colorbar. default is (0.0, 0.5)

  • cb_panchor (tuple (float, float)) – proportionality anchor of the colorbar. default is (1.0, 0.5)

  • cb_label (str, Optional) – label of the colorbar.

  • cb_spacing (str, Optional) – spacing of the colorbar. default is uniform

  • cb_drawedges (bool,) – draw edges inside of the colorbar. default is False

Notes

This module works with numerical data i.e if the data must contains the numerical features only. If categorical values are included in the dataset, they should be removed and the size of the data should be chunked during the fit methods.

fit(X=None, y=None, **fit_params)[source]#

Fit data and populate the attributes for plotting purposes.

There is no conventional procedure for checking if a method is fitted. However, an class that is not fitted should raise watex.exceptions.NotFittedError when a method is called.

Parameters:
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like, shape (M, ) M=m-samples,) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • data (Filepath or Dataframe or shape (M, N) from) – pandas.DataFrame. Dataframe containing samples M and features N

  • fit_params (dict Additional keywords arguments from) – :func:watex.utils.coreutils._is_readable`

Returns:

``self`` – returns self for easy method chaining.

Return type:

EvalPlot instance

fit_transform(X, y=None, **fit_params)[source]#

Fit and transform at once.

Parameters:

X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Returns:

X – The transformed array or dataframe with numerical features

Return type:

NDArray |Dataframe , shape (M x N )

property inspect#

Inspect data and trigger plot after checking the data entry. Raises NotFittedError if ExPlot is not fitted yet.

plotConfusionMatrix(clf, *, kind=None, labels=None, matshow_kws=None, **conf_mx_kws)[source]#

Plot confusion matrix for error evaluation.

A representation of the confusion matrix for error visualization. If kind is set map, plot will give the number of confused instances/items. However when kind is set to error, the number of items confused is explained as a percentage.

Parameters:

clf (callable, always as a function, classifier estimator) – A supervised predictor with a finite set of discrete possible output values. A classifier must supports modeling some of binary, targets. It must store a classes attribute after fitting.

labels: int, or list of int, optional

Specific class to evaluate the tradeoff of precision

and recall. label needs to be specified and a value within the target.

plottype: str

can be map or error to visualize the matshow of prediction and errors respectively.

matshow_kws: dict

matplotlib additional keywords arguments.

conf_mx_kws: dict

Additional confusion matrix keywords arguments.

ylabel: list

list of labels names to hold the name of each categories. Return

Examples

>>> from watex.datasets import fetch_data
>>> from watex.utils.mlutils import cattarget
>>> from watex.exlib.sklearn import SVC
>>> from watex.view.mlplot import EvalPlot
>>> X, y = fetch_data ('bagoue', return_X_y=True, as_frame =True)
>>> # partition the target into 4 clusters-> just for demo
>>> b= EvalPlot(scale =True, label_values = 4 )
>>> b.fit_transform (X, y)
>>> # prepare our estimator
>>> svc_clf = SVC(C=100, gamma=1e-2, kernel='rbf', random_state =42)
>>> matshow_kwargs ={
'aspect': 'auto', # 'auto'equal
'interpolation': None,
'cmap':'jet }
>>> plot_kws ={'lw':3,
'lc':(.9, 0, .8),
'font_size':15.,
'cb_format':None,
'xlabel': 'Predicted classes',
'ylabel': 'Actual classes',
'font_weight':None,
'tp_labelbottom':False,
'tp_labeltop':True,
'tp_bottom': False
}
>>> b.plotConfusionMatrix(clf=svc_clf,
matshow_kws = matshow_kwargs,
**plot_kws)
>>> svc_clf = SVC(C=100, gamma=1e-2, kernel='rbf',
...                  random_state =42)
>>> # replace the integer identifier with litteral string
>>> b.litteral_classes = ['FR0', 'FR1', 'FR2', 'FR3']
>>> b.plotConfusionMatrix(svc_clf, matshow_kws=matshow_kwargs,
kind='error', **plot_kws)
plotPCA(n_components=None, *, n_axes=None, biplot=False, pc1_label='Axis 1', pc2_label='Axis 2', plot_dict=None, **pca_kws)[source]#

Plot PCA component analysis using decomposition.

PCA identifies the axis that accounts for the largest amount of variance in the train set X. It also finds a second axis orthogonal to the first one, that accounts for the largest amount of remaining variance.

Parameters:
  • n_components (Number of dimension to preserve. If`n_components`) – is ranged between float 0. to 1., it indicates the number of variance ratio to preserve. If None as default value the number of variance to preserve is 95%.

  • n_axes (Number of importance components to retrieve the) – variance ratio. Default is 2. The first two importance components with most variance ratio.

  • biplot (bool,) – biplot plots PCA features importance (pc1 and pc2) and visualize the level of variance and direction of components for different variables. Refer to Serafeim Loukas

  • pc1_label (str, default ='Axis 1') – the first component with most variance held in ‘Axis 1’. Can be modified to any other axis for instance ‘Axis 3’ to replace the component in ‘Axis 1’ to the one in Axis 3 and so one. This will allow to visualize the position of each level of variance for each variable.

  • pc2_label (str, default ='Axis 2',) – the second component with most variance held in ‘Axis 2’. Can be modified to any other axis for instance ‘Axis 6’ to replace the component in ‘Axis 2’ to the one in Axis 6 and so one.

  • plot_dict (dict,) – dictionnary of font and properties for markers for each sample corresponding to the label_values.

  • pca_kws (dict,) – additional keyword arguments passed to watex.analysis.dimensionality.nPCA

Returns:

``self``self for easy method chaining.

Return type:

EvalPlot instance

Notes

By default, nPCA methods plots the first two principal components named pc1_label for axis 1 and pc2_label for axis 2. If you want to plot the first component pc1 vs the third components`pc2` set the pc2_label to Axis 3 and set the n_components to 3 that is the max reduced columns to retrieve, otherwise an users warning will be displayed. Commonly Algorithm should automatically detect the digit 3 in the litteral pc1_labels including Axis (e.g. ‘Axis 3`) and will consider as the third component `pc3 `. The same process is available for other axis.

Examples

>>> from watex.datasets import load_bagoue
>>> from watex.view.mlplot import EvalPlot
>>> X , y = load_bagoue(as_frame =True )
>>> b=EvalPlot(tname ='flow', encode_labels=True ,
                  scale = True )
>>> b.fit_transform (X, y)
>>> b.plotPCA (n_components= 2 )
...
>>> # pc1 and pc2 labels > n_components -> raises user warnings
>>> b.plotPCA (n_components= 2 , biplot=False, pc1_label='Axis 3',
               pc2_label='axis 4')
... UserWarning: Number of components and axes might be consistent;
    '2'and '4 are given; default two components are used.
>>> b.plotPCA (n_components= 8 , biplot=False, pc1_label='Axis3',
               pc2_label='axis4')
    # works fine since n_components are greater to the number of axes
... EvalPlot(tname= None, objective= None, scale= True, ... ,
             sns_height= 4.0, sns_aspect= 0.7, verbose= 0)
plotPR(clf, label, kind=None, method=None, cvp_kws=None, **prt_kws)[source]#

Precision/recall (PR) and tradeoff plots.

PR computes a score based on the decision function and plot the result as a score vs threshold.

Parameters:

clf (callable, always as a function, classifier estimator) – A supervised predictor with a finite set of discrete possible output values. A classifier must supports modeling some of binary, targets. It must store a classes attribute after fitting.

label: int,

Specific class to evaluate the tradeoff of precision and recall. label needs to be specified and a value within the target. kind: str, [‘threshold|’recall’], default=’threshold’ kind of PR plot. If kind is ‘recall’, method plots the precision VS the recall scores, otherwiwe the PR tradeoff is plotted against the ‘threshold.’

method: str

Method to get scores from each instance in the trainset. Could be decison_funcion or predict_proba. When using the scikit-Learn classifier, it generally has one of the method. Default is decision_function.

cvp_kws: dict, optional

The sklearn.model_selection.cross_val_predict() keywords additional arguments

prt_kws:dict,

Additional keyword arguments passed to func:watex.exlib.sklearn.precision_recall_tradeoff Return

Examples

>>> from watex.exlib.sklearn import SGDClassifier
>>> from watex.datasets.dload import load_bagoue
>>> from watex.utils import cattarget
>>> from watex.view.mlplot import EvalPlot
>>> X , y = load_bagoue(as_frame =True )
>>> sgd_clf = SGDClassifier(random_state= 42) # our estimator
>>> b= EvalPlot(scale = True , encode_labels=True)
>>> b.fit_transform(X, y)
>>> # binarize the label b.y
>>> ybin = cattarget(b.y, labels= 2 ) # can also use labels =[0, 1]
>>> b.y = ybin
>>> # plot the Precision-recall tradeoff
>>> b.plotPR(sgd_clf , label =1) # class=1
... EvalPlot(tname= None, objective= None, scale= True, ... ,
sns_height= 4.0, sns_aspect= 0.7, verbose= 0)
plotROC(clfs, label, method=None, cvp_kws=None, **roc_kws)[source]#

Plot receiving operating characteric (ROC) classifiers.

Can plot multiple classifiers at once. If multiple classifiers are given, each classifier must be a tuple of ( <name>, classifier>, <method>). For instance, to plot the both sklearn.ensemble.RandomForestClassifier and sklearn.linear_model.SGDClassifier classifiers, they must be ranged as follow:

clfs =[
    ('sgd', SGDClassifier(), "decision_function" ),
    ('forest', RandomForestClassifier(), "predict_proba")
    ]

It is important to know whether the method ‘predict_proba’ is valid for the scikit-learn classifier, we want to plot its ROC curve.

Parameters:
  • clfs (callables, always as a function, classifier estimators) – A supervised predictor with a finite set of discrete possible output values. A classifier must supports modeling some of binary, targets. It must store a classes attribute after fitting.

  • label (int,) – Specific class to evaluate the tradeoff of precision and recall. label needs to be specified and a value within the target.

  • kind (str, ['threshold|'recall'], default='threshold') – kind of PR plot. If kind is ‘recall’, method plots the precision VS the recall scores, otherwiwe the PR tradeoff is plotted against the ‘threshold.’

  • method (str) – Method to get scores from each instance in the trainset. Could be decison_funcion or predict_proba. When using the scikit-Learn classifier, it generally has one of the method. Default is decision_function.

  • cvp_kws (dict, optional) – The sklearn.model_selection.cross_val_predict() keywords additional arguments

  • prt_kws (dict,) – Additional keyword arguments passed to func:watex.exlib.sklearn.precision_recall_tradeoff

  • roc_kws (dict) – roc_curve additional keywords arguments.

Returns:

``self``self for easy method chaining.

Return type:

EvalPlot instance

Examples

  1. Plot ROC for single classifier

>>> from watex.exlib.sklearn import ( SGDClassifier,
                                     RandomForestClassifier
                                     )
>>> from watex.datasets.dload import load_bagoue
>>> from watex.utils import cattarget
>>> from watex.view.mlplot import EvalPlot
>>> X , y = load_bagoue(as_frame =True )
>>> sgd_clf = SGDClassifier(random_state= 42) # our estimator
>>> b= EvalPlot(scale = True , encode_labels=True)
>>> b.fit_transform(X, y)
>>> # binarize the label b.y
>>> ybin = cattarget(b.y, labels= 2 ) # can also use labels =[0, 1]
>>> b.y = ybin
>>> # plot the ROC
>>> b.plotROC(sgd_clf , label =1) # class=1
... EvalPlot(tname= None, objective= None, scale= True, ... ,
             sns_height= 4.0, sns_aspect= 0.7, verbose= 0)

(2)-> Plot ROC for multiple classifiers

>>> b= EvalPlot(scale = True , encode_labels=True,
                lw =3., lc=(.9, 0, .8), font_size=7 )
>>> sgd_clf = SGDClassifier(random_state= 42)
>>> forest_clf =RandomForestClassifier(random_state=42)
>>> b.fit_transform(X, y)
>>> # binarize the label b.y
>>> ybin = cattarget(b.y, labels= 2 ) # can also use labels =[0, 1]
>>> b.y = ybin
>>> clfs =[('sgd', sgd_clf, "decision_function" ),
       ('forest', forest_clf, "predict_proba")]
>>> b.plotROC (clfs =clfs , label =1 )
... EvalPlot(tname= None, objective= None, scale= True, ... ,
             sns_height= 4.0, sns_aspect= 0.7, verbose= 0)
save(fig)[source]#

savefigure if figure properties are given.

transform(X, **t_params)[source]#

Transform the data and imputs the numerical features.

It is not convenient to use transform if user want to keep categorical values in the array

Parameters:
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • t_params (dict,) – Keyword arguments passed to sklearn.impute.SimpleImputer for imputing the missing data; default strategy is ‘most_frequent’ or keywords arguments passed to :func:watex.utils.funcutils.to_numeric_dtypes`

Returns:

X – The transformed array or dataframe with numerical features

Return type:

NDArray |Dataframe , shape (M x N )

class watex.view.ExPlot(tname=None, inplace=False, **kws)[source]#

Bases: BasePlot

Exploratory plot for data analysis

ExPlot is a shadow class. Explore data is needed to create a model since it gives a feel for the data and also at great excuses to meet and discuss issues with business units that controls the data. ExPlot methods i.e. return an instancied object that inherits from watex.property.Baseplots ABC (Abstract Base Class) for visualization.

Parameters:
  • savefig (str, Path-like object,) – savefigure’s name, default is None

  • fig_dpi (float,) – dots-per-inch resolution of the figure. default is 300

  • fig_num (int,) – size of figure in inches (width, height). default is [5, 5]

  • fig_size (Tuple (int, int) or inch) – size of figure in inches (width, height).*default* is [5, 5]

  • fig_orientation (str,) – figure orientation. default is landscape

  • fig_tile (str,) – figure title. default is None

  • fs (float,) – size of font of axis tick labels, axis labels are fs+2. default is 6

  • ls (str,) – line style, it can be [ ‘-’ | ‘.’ | ‘:’ ] . default is ‘-’

  • lc (str, Optional,) – line color of the plot, default is k

  • lw (float, Optional,) – line weight of the plot, default is 1.5

  • alpha (float between 0 < alpha < 1,) – transparency number, default is 0.5,

  • font_weight (str, Optional) – weight of the font , default is bold.

  • font_style (str, Optional) – style of the font. default is italic

  • font_size (float, Optional) – size of font in inches (width, height). default is 3.

  • ms (float, Optional) – size of marker in points. default is 5

  • marker (str, Optional) – marker of stations default is o.

  • marker_style (str, Optional) – facecolor of the marker. default is yellow

  • marker_edgecolor (str, Optional) – facecolor of the marker. default is yellow

  • marker_edgewidth (float, Optional) – width of the marker. default is 3.

  • xminorticks (float, Optional) – minortick according to x-axis size and default is 1.

  • yminorticks (float, Optional) – yminorticks according to x-axis size and default is 1.

  • bins (histograms element separation between two bar. default is 10.) –

  • xlim (tuple (int, int), Optional) – limit of x-axis in plot.

  • ylim (tuple (int, int), Optional) – limit of x-axis in plot.

  • xlabel (str, Optional,) – label name of x-axis in plot.

  • ylabel (str, Optional,) – label name of y-axis in plot.

  • rotate_xlabel (float, Optional) – angle to rotate xlabel in plot.

  • rotate_ylabel (float, Optional) – angle to rotate ylabel in plot.

  • leg_kws (dict, Optional) – keyword arguments of legend. default is empty dict

  • plt_kws (dict, Optional) – keyword arguments of plot. default is empty dict

  • glc (str, Optional) – line color of the grid plot, default is k

  • glw (float, Optional) – line weight of the grid plot, default is 2

  • galpha (float, Optional,) – transparency number of grid, default is 0.5

  • gaxis (str ('x', 'y', 'both')) – type of axis to hold the grid, default is both

  • gwhich (str, Optional) – kind of grid in the plot. default is major

  • tp_axis (bool,) – axis to apply the ticks params. default is both

  • tp_labelsize (str, Optional) – labelsize of ticks params. default is italic

  • tp_bottom (bool,) – position at bottom of ticks params. default is True.

  • tp_labelbottom (bool,) – put label on the bottom of the ticks. default is False

  • tp_labeltop (bool,) – put label on the top of the ticks. default is True

  • cb_orientation (str , ('vertical', 'horizontal')) – orientation of the colorbar, default is vertical

  • cb_aspect (float, Optional) – aspect of the colorbar. default is 20.

  • cb_shrink (float, Optional) – shrink size of the colorbar. default is 1.0

  • cb_pad (float,) – pad of the colorbar of plot. default is .05

  • cb_anchor (tuple (float, float)) – anchor of the colorbar. default is (0.0, 0.5)

  • cb_panchor (tuple (float, float)) – proportionality anchor of the colorbar. default is (1.0, 0.5)

  • cb_label (str, Optional) – label of the colorbar.

  • cb_spacing (str, Optional) – spacing of the colorbar. default is uniform

  • cb_drawedges (bool,) – draw edges inside of the colorbar. default is False

  • sns_orient ('v' | 'h', optional) – Orientation of the plot (vertical or horizontal). This is usually inferred based on the type of the input variables, but it can be used to resolve ambiguity when both x and y are numeric or when plotting wide-form data. default is v which refer to ‘vertical’

  • sns_style (dict, or one of {darkgrid, whitegrid, dark, white, ticks}) – A dictionary of parameters or the name of a preconfigured style.

  • sns_palette (seaborn color paltte | matplotlib colormap | hls | husl) – Palette definition. Should be something color_palette() can process. the palette generates the point with different colors

  • sns_height (float,) – Proportion of axes extent covered by each rug element. Can be negative. default is 4.

  • sns_aspect (scalar (float, int)) – Aspect ratio of each facet, so that aspect * height gives the width of each facet in inches. default is .7

Returns:

self – returns self for easy method chaining.

Return type:

Baseclass instance

Examples

>>> import pandas as pd
>>> from watex.view import ExPlot
>>> data = pd.read_csv ('data/geodata/main.bagciv.data.csv' )
>>> ExPlot(fig_size = (12, 4)).fit(data).missing(kind ='corr')
... <watex.view.plot.ExPlot at 0x21162a975e0>
fit(data, **fit_params)[source]#

Fit data and populate the arguments for plotting purposes.

There is no conventional procedure for checking if a method is fitted. However, an class that is not fitted should raise exceptions.NotFittedError when a method is called.

Parameters:
  • data (Filepath or Dataframe or shape (M, N) from) – pandas.DataFrame. Dataframe containing samples M and features N

  • fit_params (dict) – Additional keywords arguments for reading the data is given as a path-like object passed from :func:watex.utils.coreutils._is_readable`

Returns:

``self`` – returns self for easy method chaining.

Return type:

Plot instance

property inspect#

Inspect data and trigger plot after checking the data entry. Raises NotFittedError if ExPlot is not fitted yet.

msg = "{expobj.__class__.__name__} instance is not fitted yet. Call 'fit' with appropriate arguments before using this method."#
plotbv(xname=None, yname=None, kind='box', **kwd)[source]#

Visualize distributions using the box, boxen or violin plots.

Parameters:
  • xname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.

  • yname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.

  • kind (str) – style of the plot. Can be [‘box’|’boxen’|’violin’]. default is box

  • kwd (dict,) – Other keyword arguments are passed down to seaborn.boxplot .

Returns:

  • ``self`` (ExPlot instance and returns self for easy)

  • method chaining.

Example

>>> from watex.datasets import fetch_data
>>> from watex.view import ExPlot
>>> data = fetch_data ('bagoue original').get('data=dfy1')
>>> p= ExPlot(tname='flow').fit(data)
>>> p.plotbv(xname='flow', yname='sfi', kind='violin')
plotcutcomparison(xname=None, yname=None, q=10, bins=3, cmap='viridis', duplicates='drop', **kws)[source]#

Compare the cut or q quantiles values of ordinal categories.

It simulates that the the bining of ‘xname’ into a q quantiles, and ‘yname’into bins. Plot is normalized so its fills all the vertical area. which makes easy to see that in the 4*q % quantiles.

Parameters:
  • xname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.

  • yname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.

  • q (int or list-like of float) – Number of quantiles. 10 for deciles, 4 for quartiles, etc. Alternately array of quantiles, e.g. [0, .25, .5, .75, 1.] for quartiles.

  • bins (int, sequence of scalars, or IntervalIndex) –

    The criteria to bin by.

    • intDefines the number of equal-width bins in the range of x.

      The range of x is extended by .1% on each side to include the minimum and maximum values of x.

    • sequence of scalarsDefines the bin edges allowing for non-uniform

      width. No extension of the range of x is done.

    • IntervalIndexDefines the exact bins to be used. Note that

      IntervalIndex for bins must be non-overlapping.

  • labels (array or False, default None) – Used as labels for the resulting bins. Must be of the same length as the resulting bins. If False, return only integer indicators of the bins. If True, raises an error.

  • cmap (str, color or list of color, optional) – The matplotlib colormap of the bar faces.

  • duplicates ({default 'raise', 'drop}, optional) – If bin edges are not unique, raise ValueError or drop non-uniques. default is ‘drop’

  • kws (dict,) – Other keyword arguments are passed down to pandas.qcut .

Returns:

``self``

Return type:

ExPlot instance and returns self for easy method chaining.

Examples

>>> from watex.datasets import fetch_data
>>> from watex.view import ExPlot
>>> data = fetch_data ('bagoue original').get('data=dfy1')
>>> p= ExPlot(tname='flow').fit(data)
>>> p.plotcutcomparison(xname ='sfi', yname='ohmS')
plothist(xname=None, *, kind='hist', **kws)[source]#

A histogram visualization of numerica data.

Parameters:
  • xname (str , xlabel) – feature name in the dataframe and is the label on x-axis. Raises an error , if it does not exist in the dataframe

  • kind (str) – Mode of pandas series plotting. the default is hist.

  • kws (dict,) – additional keywords arguments from : func:pandas.DataFrame.plot

Returns:

``self`` – returns self for easy method chaining.

Return type:

ExPlot instance

plothistvstarget(xname, c=None, *, posilabel=None, neglabel=None, kind='binarize', **kws)[source]#

A histogram of continuous against the target of binary plot.

Parameters:
  • xname (str,) – the column name to consider on x-axis. Shoud be an item in the dataframe columns. Raise an error if element does not exist.

  • c (str or int) – the class value in y to consider. Raise an error if not in y. value c can be considered as the binary positive class

  • posilabel (str, Optional) – the label of c considered as the positive class

  • neglabel (str, Optional) – the label of other classes (categories) except c considered as the negative class

  • kind (str, Optional, (default, 'binarize')) – the kind of plot features against target. binarize considers plotting the positive class (‘c’) vs negative class (‘not c’)

  • kws (dict,) – Additional keyword arguments of `seaborn displot`_

Returns:

``self`` – returns self for easy method chaining.

Return type:

ExPlot instance

Examples

>>> from watex.utils import read_data
>>> from watex.view import ExPlot
>>> data = read_data  ( 'data/geodata/main.bagciv.data.csv' )
>>> p = ExPlot(tname ='flow').fit(data)
>>> p.fig_size = (7, 5)
>>> p.savefig ='bbox.png'
>>> p.plothistvstarget (xname= 'sfi', c = 0, kind = 'binarize',  kde=True,
                  posilabel='dried borehole (m3/h)',
                  neglabel = 'accept. boreholes'
                  )
Out[95]: <'ExPlot':xname='sfi', yname=None , tname='flow'>
plotjoint(xname, yname=None, corr='pearson', kind='scatter', pkg='sns', yb_kws=None, **kws)[source]#

fancier scatterplot that includes histogram on the edge as well as a regression line called a joinplot

Parameters:
  • xname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.

  • yname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.

  • pkg (str, Optional,) – kind or library to use for visualization. can be [‘sns’|’yb’] for ‘seaborn’ or ‘yellowbrick’. default is sns.

  • kind (str in {'scatter', 'hex'}, default: 'scatter') – The type of plot to render in the joint axes. Note that when kind=’hex’ the target cannot be plotted by color.

  • corr (str, default: 'pearson') – The algorithm used to compute the relationship between the variables in the joint plot, one of: ‘pearson’, ‘covariance’, ‘spearman’, ‘kendalltau’.

  • yb_kws (dict,) – Additional keywords arguments from yellowbrick.JointPlotVisualizer

  • kws (dict,) – Other keyword arguments are passed down to seaborn.joinplot .

Returns:

``self``

Return type:

ExPlot instance and returns self for easy method chaining.

Notes

When using the yellowbrick library and array i.e a (x, y) variables in the columns as well as the target arrays must not contain infs or NaNs values. A value error raises if that is the case.

plotmissing(*, kind=None, sample=None, **kwd)[source]#

Vizualize patterns in the missing data.

Parameters:
  • data (Dataframe or shape (M, N) from pandas.DataFrame) – Dataframe containing samples M and features N

  • kind (str, Optional) –

    kind of visualization. Can be dendrogramm, mbar or bar plot for dendrogram , msno bar and plt visualization respectively:

    • bar plot counts the nonmissing data using pandas

    • mbar use the msno package to count the number

      of nonmissing data.

    • dendrogram`` show the clusterings of where the data is missing.

      leaves that are the same level predict one onother presence (empty of filled). The vertical arms are used to indicate how different cluster are. short arms mean that branch are similar.

    • ``corr` creates a heat map showing if there are correlations

      where the data is missing. In this case, it does look like the locations where missing data are corollated.

    • mpatterns is the default vizualisation. It is useful for viewing

      contiguous area of the missing data which would indicate that the missing data is not random. The matrix function includes a sparkline along the right side. Patterns here would also indicate non-random missing data. It is recommended to limit the number of sample to be able to see the patterns.

    Any other value will raise an error

  • sample (int, Optional) – Number of row to visualize. This is usefull when data is composed of many rows. Skrunked the data to keep some sample for visualization is recommended. None plot all the samples ( or examples) in the data

  • kws (dict) – Additional keywords arguments of msno.matrix plot.

Returns:

``self`` – returns self for easy method chaining.

Return type:

ExPlot instance

Example

>>> import pandas as pd
>>> from watex.view import ExPlot
>>> data = pd.read_csv ('data/geodata/main.bagciv.data.csv' )
>>> p = ExPlot().fit(data)
>>> p.fig_size = (12, 4)
>>> p.plotmissing(kind ='corr')
plotpairgrid(xname=None, yname=None, vars=None, **kwd)[source]#

Create a pair grid.

Is a matrix of columns and kernel density estimations. To color by a columns from a dataframe, use ‘hue’ parameter.

Parameters:
  • xname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.

  • yname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.

  • vars (list, str) – list of items in the dataframe columns. Raise an error if items dont exist in the dataframe columns.

  • kws (dict,) – Other keyword arguments are passed down to seaborn.joinplot .

Returns:

``self``

Return type:

ExPlot instance and returns self for easy method chaining.

Example

>>> from watex.datasets import fetch_data
>>> from watex.view import ExPlot
>>> data = fetch_data ('bagoue original').get('data=dfy1')
>>> p= ExPlot(tname='flow').fit(data)
>>> p.plotpairgrid (vars = ['magnitude', 'power', 'ohmS'] )
... <'ExPlot':xname=(None,), yname=None , tname='flow'>
plotpairwisecomparison(corr='pearson', pkg='sns', **kws)[source]#

Create pairwise comparizons between features.

Plots shows a [‘pearson’|’spearman’|’covariance’] correlation.

Parameters:
  • corr (str, ['pearson'|'spearman'|'covariance']) – Method of correlation to perform. Note that the ‘person’ and ‘covariance’ don’t support string value. If such kind of data is given, turn the corr to spearman. default is pearson

  • pkg (str, Optional,) – kind or library to use for visualization. can be [‘sns’|’yb’] for ‘seaborn’ or ‘yellowbrick’ respectively. default is sns.

  • kws (dict,) – Additional keywords arguments are passed down to yellowbrick.Rand2D and seaborn.heatmap

Returns:

``self``

Return type:

ExPlot instance and returns self for easy method chaining.

Example

>>> from watex.datasets import fetch_data
>>> from watex.view import ExPlot
>>> data = fetch_data ('bagoue original').get('data=dfy1')
>>> p= ExPlot(tname='flow').fit(data)
>>> p.plotpairwisecomparison(fmt='.2f', corr='spearman', pkg ='yb',
                             annot=True,
                             cmap='RdBu_r',
                             vmin=-1,
                             vmax=1 )
... <'ExPlot':xname='sfi', yname='ohmS' , tname='flow'>
plotparallelcoords(classes=None, pkg='pd', rxlabel=45, **kwd)[source]#

Use parallel coordinates in multivariates for clustering visualization

Parameters:
  • classes (list, default: None) –

    a list of class names for the legend The class labels for each class in y, ordered by sorted class index. These names act as a label encoder for the legend, identifying integer classes or renaming string labels. If omitted, the class labels will be taken from the unique values in y.

    Note that the length of this list must match the number of unique values in y, otherwise an exception is raised.

  • pkg (str, Optional,) – kind or library to use for visualization. can be [‘sns’|’pd’] for ‘yellowbrick’ or ‘pandas’ respectively. default is pd.

  • rxlabel (int, default is 45) – rotate the xlabel when using pkg is set to pd.

  • kws (dict,) – Additional keywords arguments are passed down to yellowbrick.ParallelCoordinates and pandas.plotting.parallel_coordinates()

Returns:

``self``

Return type:

ExPlot instance and returns self for easy method chaining.

Examples

>>> from watex.datasets import fetch_data
>>> from watex.view import ExPlot
>>> data =fetch_data('original data').get('data=dfy1')
>>> p = ExPlot (tname ='flow').fit(data)
>>> p.plotparallelcoords(pkg='yb')
... <'ExPlot':xname=None, yname=None , tname='flow'>
plotradviz(classes=None, pkg='pd', **kwd)[source]#

plot each sample on circle or square, with features on the circonference to vizualize separately between target.

Values are normalized and each figure has a spring that pulls samples to it based on the value.

Parameters:
  • classes (list of int | float, [categorized classes]) – must be a value in the target. Specified classes must match the number of unique values in target. otherwise an error occurs. the default behaviour i.e. None detect all classes in unique value in the target.

  • pkg (str, Optional,) –

    kind or library to use for visualization. can be [‘sns’|’pd’] for

    ’yellowbrick’ or ‘pandas’ respectively. default is pd.

  • kws (dict,) – Additional keywords arguments are passed down to yellowbrick.RadViZ and pandas.plotting.radviz()

Returns:

``self``

Return type:

ExPlot instance and returns self for easy method chaining.

Examples

(1)-> using yellowbrick RadViz

>>> from watex.datasets import fetch_data
>>> from watex.view import ExPlot
>>> data0 = fetch_data('bagoue original').get('data=dfy1')
>>> p = ExPlot(tname ='flow').fit(data0)
>>> p.plotradviz(classes= [0, 1, 2, 3] ) # can set to None
  1. -> Using pandas radviz plot

>>> # use pandas with
>>> data2 = fetch_data('bagoue original').get('data=dfy2')
>>> p = ExPlot(tname ='flow').fit(data2)
>>> p.plotradviz(classes= None, pkg='pd' )
... <'ExPlot':xname=None, yname=None , tname='flow'>
plotscatter(xname=None, yname=None, c=None, s=None, **kwd)[source]#

Shows the relationship between two numeric columns.

Parameters:
  • xname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.

  • yname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.

  • c (str, int or array_like, Optional) –

    The color of each point. Possible values are:
    • A single color string referred to by name, RGB or RGBA code,

      for instance ‘red’ or ‘#a98d19’.

    • A sequence of color strings referred to by name, RGB or RGBA

      code, which will be used for each point’s color recursively. For instance [‘green’,’yellow’] all points will be filled in green or yellow, alternatively.

    • A column name or position whose values will be used to color

      the marker points according to a colormap.

  • s (scalar or array_like, Optional,) –

    The size of each point. Possible values are:
    • A single scalar so all points have the same size.

    • A sequence of scalars, which will be used for each point’s

      size recursively. For instance, when passing [2,14] all points size will be either 2 or 14, alternatively.

  • kwd (dict,) – Other keyword arguments are passed down to seaborn.scatterplot .

Returns:

``self`` – returns self for easy method chaining.

Return type:

ExPlot instance

Example

>>> from watex.view import ExPlot
>>> p = ExPlot(tname='flow').fit(data).plotscatter (
    xname ='sfi', yname='ohmS')
>>> p
...  <'ExPlot':xname='sfi', yname='ohmS' , tname='flow'>

References

Scatterplot: https://seaborn.pydata.org/generated/seaborn.scatterplot.html Pd.scatter plot: https://www.w3resource.com/pandas/dataframe/dataframe-plot-scatter.php

save(fig)[source]#

savefigure if figure properties are given.

class watex.view.QuickPlot(classes=None, tname=None, mapflow=False, **kws)[source]#

Bases: BasePlot

Special class dealing with analysis modules for quick diagrams, histograms and bar visualizations.

Originally, it was designed for the flow rate prediction, however, it still works with any other dataset by following the parameters details.

Parameters:
  • data (str, filepath_or_buffer or pandas.core.DataFrame) – Path -like object or Dataframe. If data is given as path-like object, data is read, asserted and validated. Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be a file://localhost/path/to/table.csv. If you want to pass in a path object, pandas accepts any os.PathLike. By file-like object, we refer to objects with a read() method, such as a file handle e.g. via builtin open function or StringIO.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • tname (str,) – A target name or label. In supervised learning the target name is considered as the reference name of y or label variable.

  • classes (list of int | float, [categorized classes]) –

    list of the categorial values encoded to numerical. For instance, for flow data analysis in the Bagoue dataset, the classes could be [0., 1., 3.] which means:

    * 0 m3/h  --> FR0
    * > 0 to 1 m3/h --> FR1
    * > 1 to 3 m3/h --> FR2
    * > 3 m3/h  --> FR3
    

  • mapflow (bool,) –

    Is refer to the flow rate prediction using DC-resistivity features and work when the tname is set to flow. If set to True, value in the target columns should map to categorical values. Commonly the flow rate values are given as a trend of numerical values. For a classification purpose, flow rate must be converted to categorical values which are mainly refered to the type of types of hydraulic. Mostly the type of hydraulic system is in turn tided to the number of the living population in a specific area. For instance, flow classes can be ranged as follow:

    • FR = 0 is for dry boreholes

    • 0 < FR ≤ 3m3/h for village hydraulic (≤2000 inhabitants)

    • 3 < FR ≤ 6m3/h for improved village hydraulic(>2000-20 000inhbts)

    • 6 <FR ≤ 10m3/h for urban hydraulic (>200 000 inhabitants).

    Note that the flow range from mapflow is not exhaustive and can be modified according to the type of hydraulic required on the project.

  • savefig (str, Path-like object,) – savefigure’s name, default is None

  • fig_dpi (float,) – dots-per-inch resolution of the figure. default is 300

  • fig_num (int,) – size of figure in inches (width, height). default is [5, 5]

  • fig_size (Tuple (int, int) or inch) – size of figure in inches (width, height).*default* is [5, 5]

  • fig_orientation (str,) – figure orientation. default is landscape

  • fig_tile (str,) – figure title. default is None

  • fs (float,) – size of font of axis tick labels, axis labels are fs+2. default is 6

  • ls (str,) – line style, it can be [ ‘-’ | ‘.’ | ‘:’ ] . default is ‘-’

  • lc (str, Optional,) – line color of the plot, default is k

  • lw (float, Optional,) – line weight of the plot, default is 1.5

  • alpha (float between 0 < alpha < 1,) – transparency number, default is 0.5,

  • font_weight (str, Optional) – weight of the font , default is bold.

  • font_style (str, Optional) – style of the font. default is italic

  • font_size (float, Optional) – size of font in inches (width, height). default is 3.

  • ms (float, Optional) – size of marker in points. default is 5

  • marker (str, Optional) – marker of stations default is o.

  • marker_style (str, Optional) – facecolor of the marker. default is yellow

  • marker_edgecolor (str, Optional) – facecolor of the marker. default is yellow

  • marker_edgewidth (float, Optional) – width of the marker. default is 3.

  • xminorticks (float, Optional) – minortick according to x-axis size and default is 1.

  • yminorticks (float, Optional) – yminorticks according to x-axis size and default is 1.

  • bins (histograms element separation between two bar. default is 10.) –

  • xlim (tuple (int, int), Optional) – limit of x-axis in plot.

  • ylim (tuple (int, int), Optional) – limit of x-axis in plot.

  • xlabel (str, Optional,) – label name of x-axis in plot.

  • ylabel (str, Optional,) – label name of y-axis in plot.

  • rotate_xlabel (float, Optional) – angle to rotate xlabel in plot.

  • rotate_ylabel (float, Optional) – angle to rotate ylabel in plot.

  • leg_kws (dict, Optional) – keyword arguments of legend. default is empty dict

  • plt_kws (dict, Optional) – keyword arguments of plot. default is empty dict

  • glc (str, Optional) – line color of the grid plot, default is k

  • glw (float, Optional) – line weight of the grid plot, default is 2

  • galpha (float, Optional,) – transparency number of grid, default is 0.5

  • gaxis (str ('x', 'y', 'both')) – type of axis to hold the grid, default is both

  • gwhich (str, Optional) – kind of grid in the plot. default is major

  • tp_axis (bool,) – axis to apply the ticks params. default is both

  • tp_labelsize (str, Optional) – labelsize of ticks params. default is italic

  • tp_bottom (bool,) – position at bottom of ticks params. default is True.

  • tp_labelbottom (bool,) – put label on the bottom of the ticks. default is False

  • tp_labeltop (bool,) – put label on the top of the ticks. default is True

  • cb_orientation (str , ('vertical', 'horizontal')) – orientation of the colorbar, default is vertical

  • cb_aspect (float, Optional) – aspect of the colorbar. default is 20.

  • cb_shrink (float, Optional) – shrink size of the colorbar. default is 1.0

  • cb_pad (float,) – pad of the colorbar of plot. default is .05

  • cb_anchor (tuple (float, float)) – anchor of the colorbar. default is (0.0, 0.5)

  • cb_panchor (tuple (float, float)) – proportionality anchor of the colorbar. default is (1.0, 0.5)

  • cb_label (str, Optional) – label of the colorbar.

  • cb_spacing (str, Optional) – spacing of the colorbar. default is uniform

  • cb_drawedges (bool,) – draw edges inside of the colorbar. default is False

  • sns_orient ('v' | 'h', optional) – Orientation of the plot (vertical or horizontal). This is usually inferred based on the type of the input variables, but it can be used to resolve ambiguity when both x and y are numeric or when plotting wide-form data. default is v which refer to ‘vertical’

  • sns_style (dict, or one of {darkgrid, whitegrid, dark, white, ticks}) – A dictionary of parameters or the name of a preconfigured style.

  • sns_palette (seaborn color paltte | matplotlib colormap | hls | husl) – Palette definition. Should be something color_palette() can process. the palette generates the point with different colors

  • sns_height (float,) – Proportion of axes extent covered by each rug element. Can be negative. default is 4.

  • sns_aspect (scalar (float, int)) – Aspect ratio of each facet, so that aspect * height gives the width of each facet in inches. default is .7

Returns:

self – returns self for easy method chaining.

Return type:

Baseclass instance

Examples

>>> from watex.view.plot import  QuickPlot
>>> data = 'data/geodata/main.bagciv.data.csv'
>>> qkObj = QuickPlot(  leg_kws= dict( loc='upper right'),
...          fig_title = '`sfi` vs`ohmS|`geol`',
...            )
>>> qkObj.tname='flow' # target the DC-flow rate prediction dataset
>>> qkObj.mapflow=True  # to hold category FR0, FR1 etc..
>>> qkObj.fit(data)
>>> sns_pkws= dict ( aspect = 2 ,
...          height= 2,
...                  )
>>> map_kws= dict( edgecolor="w")
>>> qkObj.discussingfeatures(features =['ohmS', 'sfi','geol', 'flow'],
...                           map_kws=map_kws,  **sns_pkws
...                         )
barcatdist(basic_plot=True, groupby=None, **kws)[source]#

Bar plot distribution.

Plots a categorical distribution according to the occurence of the target in the data.

Parameters:
  • basic_pot (bool,) – Plot only the occurence of targetted columns from matplotlib.pyplot.bar function.

  • groupby (list or dict, optional) –

    Group features for plotting. For instance it plot others features located in the df columns. The plot features can be on list and use default plot properties. To customize plot provide, one may provide, the features on dict with convenients properties like:

    * `groupby`= ['shape', 'type'] #{'type':{'color':'b',
                                 'width':0.25 , 'sep': 0.}
                         'shape':{'color':'g', 'width':0.25,
                                 'sep':0.25}}
    

  • kws (dict,) – Additional keywords arguments from seaborn.countplot

  • data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.

Returns:

Returns self for easy method chaining.

Return type:

QuickPlot instance

Notes

The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.

Examples

>>> from watex.view.plot import QuickPlot
>>> from watex.datasets import load_bagoue
>>> data = load_bagoue ().frame
>>> qplotObj= QuickPlot(xlabel = 'Anomaly type',
                        ylabel='Number of  occurence (%)',
                        lc='b', tname='flow')
>>> qplotObj.sns_style = 'darkgrid'
>>> qplotObj.fit(data)
>>> qplotObj. barcatdist(basic_plot =False,
...                      groupby=['shape' ])
corrmatrix(cortype='num', features=None, method='pearson', min_periods=1, **sns_kws)[source]#

Method to quick plot the numerical and categorical features.

Set features by providing the names of features for visualization.

Parameters:
  • cortype (str,) – The typle of parameters to cisualize their coreletions. Can be num for numerical features and cat for categorical features. Default is num for quantitative values.

  • method (str,) – the correlation method. can be ‘spearman’ or person. *Default is pearson

  • features (List, optional) – list of the name of features for correlation analysis. If given, must be sure that the names belong to the dataframe columns, otherwise an error will occur. If features are valid, dataframe is shrunk to the number of features before the correlation plot.

  • min_periods – Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation. For more details refer to https://www.geeksforgeeks.org/python-pandas-dataframe-corr/

  • sns_kws (Other seabon heatmap arguments. Refer to) – https://seaborn.pydata.org/generated/seaborn.heatmap.html

  • data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.

Returns:

Returns self for easy method chaining.

Return type:

QuickPlot instance

Notes

The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.

Example

>>> from watex.view.plot import QuickPlot
>>> from watex.datasets import load_bagoue
>>> data = load_bagoue ().frame
>>> qplotObj = QuickPlot().fit(data)
>>> sns_kwargs ={'annot': False,
...            'linewidth': .5,
...            'center':0 ,
...            # 'cmap':'jet_r',
...            'cbar':True}
>>> qplotObj.corrmatrix(cortype='cat', **sns_kwargs)
property data#
discussingfeatures(features, *, map_kws=None, map_func=None, **sns_kws)[source]#

Provides the features names at least 04 and discuss with their distribution.

This method maps a dataset onto multiple axes arrayed in a grid of rows and columns that correspond to levels of features in the dataset. The plots produced are often called “lattice”, “trellis”, or ‘small-multiple’ graphics.

Parameters:

features (list) –

List of features for discussing. The number of recommended features for better analysis is four (04) classified as below:

features_disposal = [‘x’, ‘y’, ‘col’, ‘target|hue’]

where:
  • x is the features hold to the x-axis, default is``ohmS``

  • y is the feature located on y_xis, default is sfi

  • col is the feature on column subset, *default` is col

  • target or hue for targetted examples, default is flow

If 03 features are given, the latter is considered as a target

map_kws:dict, optional

Extra keyword arguments for mapping plot.

func_map: callable, Optional

callable object, is a plot style function. Can be a ‘matplotlib-pyplot’ function like plt.scatter or ‘seaborn-scatterplot’ like sns.scatterplot. The default is sns.scatterplot.

sns_kwargs: dict, optional

kwywords arguments to control what visual semantics are used to identify the different subsets. For more details, please consult <http://seaborn.pydata.org/generated/seaborn.FacetGrid.html>.

data: str or pd.core.DataFrame

Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.

Returns:

Returns self for easy method chaining.

Return type:

QuickPlot instance

Notes

The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.

Examples

>>> from watex.view.plot import  QuickPlot
>>> from watex.datasets import load_bagoue
>>> data = load_bagoue ().frame
>>> qkObj = QuickPlot(  leg_kws={'loc':'upper right'},
...          fig_title = '`sfi` vs`ohmS|`geol`',
...            )
>>> qkObj.tname='flow' # target the DC-flow rate prediction dataset
>>> qkObj.mapflow=True  # to hold category FR0, FR1 etc..
>>> qkObj.fit(data)
>>> sns_pkws={'aspect':2 ,
...          "height": 2,
...                  }
>>> map_kws={'edgecolor':"w"}
>>> qkObj.discussingfeatures(features =['ohmS', 'sfi','geol', 'flow'],
...                           map_kws=map_kws,  **sns_pkws
...                         )
fit(data, y=None)[source]#

Fit data and populate the attributes for plotting purposes.

Parameters:
  • data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.

  • y (array-like, optional) –

    array of the target. Must be the same length as the data. If y

    is provided and data is given as str or DataFrame, all the data should be considered as the X data for analysis.

    returns:

    self – Returns self for easy method chaining.

    rtype:

    QuickPlot instance

Examples

>>> from watex.datasets import load_bagoue
>>> data = load_bagoue ().frame
>>> from watex.view.plot import QuickPlot
>>> qplotObj= QuickPlot(xlabel = 'Flow classes in m3/h',
                        ylabel='Number of  occurence (%)')
>>> qplotObj.tname= None # eith nameof target set to None
>>> qplotObj.fit(data)
>>> qplotObj.data.iloc[1:2, :]
...     num name      east      north  ...         ohmS        lwi      geol flow
    1  2.0   b2  791227.0  1159566.0  ...  1135.551531  21.406531  GRANITES  0.0
>>> qplotObj.tname= 'flow'
>>> qplotObj.mapflow= True # map the flow from num. values to categ. values
>>> qplotObj.fit(data)
>>> qplotObj.data.iloc[1:2, :]
...    num name      east      north  ...         ohmS        lwi      geol flow
    1  2.0   b2  791227.0  1159566.0  ...  1135.551531  21.406531  GRANITES  FR0
histcatdist(stacked=False, **kws)[source]#

Histogram plot distribution.

Plots a distributions of categorized classes according to the percentage of occurence.

Parameters:
  • stacked (bool) – Pill bins one to another as a cummulative values. default is False.

  • bins (int, optional) – contains the integer or sequence or string

  • range (list, optional) – is the lower and upper range of the bins

  • density (bool, optional) – contains the boolean values

  • weights (array-like, optional) – is an array of weights, of the same shape as data

  • bottom (float, optional) – is the location of the bottom baseline of each bin

  • histtype (str, optional) – is used to draw type of histogram. {‘bar’, ‘barstacked’, step, ‘stepfilled’}

  • align (str, optional) – controls how the histogram is plotted. {‘left’, ‘mid’, ‘right’}

  • rwidth (float, optional,) – is a relative width of the bars as a fraction of the bin width

  • log (bool, optional) – is used to set histogram axis to a log scale

  • color (str, optional) – is a color spec or sequence of color specs, one per dataset

  • label (str , optional) – is a string, or sequence of strings to match multiple datasets

  • normed (bool, optional) – an optional parameter and it contains the boolean values. It uses the density keyword argument instead.

  • data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.

Returns:

Returns self for easy method chaining.

Return type:

QuickPlot instance

Notes

The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.

Examples

>>> from watex.view.plot import QuickPlot
>>> from watex.datasets import load_bagoue
>>> data = load_bagoue ().frame
>>> qplotObj= QuickPlot(xlabel = 'Flow classes',
                        ylabel='Number of  occurence (%)',
                        lc='b', tname='flow')
>>> qplotObj.sns_style = 'darkgrid'
>>> qplotObj.fit(data)
>>> qplotObj. histcatdist()
property inspect#

Inspect object whether is fitted or not

joint2features(features, *, join_kws=None, marginals_kws=None, **sns_kws)[source]#

Joint method allows to visualize correlation of two features.

Draw a plot of two features with bivariate and univariate graphs.

Parameters:
  • features (list) – List of numerical features to plot for correlating analyses. will raise an error if features does not exist in the data

  • join_kws (dict, optional) – Additional keyword arguments are passed to the function used to draw the plot on the joint Axes, superseding items in the joint_kws dictionary.

  • marginals_kws (dict, optional) – Additional keyword arguments are passed to the function used to draw the plot on the marginals Axes.

  • sns_kwargs (dict, optional) – keywords arguments of seaborn joinplot methods. Refer to <http://seaborn.pydata.org/generated/seaborn.jointplot.html> for more details about usefull kwargs to customize plots.

  • data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.

Returns:

Returns self for easy method chaining.

Return type:

QuickPlot instance

Notes

The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.

Examples

>>> from watex.view.plot import QuickPlot
>>> from watex.datasets import load_bagoue
>>> data = load_bagoue ().frame
>>> qkObj = QuickPlot( lc='b', sns_style ='darkgrid',
...             fig_title='Quantitative features correlation'
...             ).fit(data)
>>> sns_pkws={
...            'kind':'reg' , #'kde', 'hex'
...            # "hue": 'flow',
...               }
>>> joinpl_kws={"color": "r",
                'zorder':0, 'levels':6}
>>> plmarg_kws={'color':"r", 'height':-.15, 'clip_on':False}
>>> qkObj.joint2features(features=['ohmS', 'lwi'],
...            join_kws=joinpl_kws, marginals_kws=plmarg_kws,
...            **sns_pkws,
...            )
multicatdist(*, x=None, col=None, hue=None, targets=None, x_features=None, y_features=None, kind='count', **kws)[source]#

Figure-level interface for drawing multiple categorical distributions plots onto a FacetGrid.

Multiple categorials plots from targetted pd.series.

Parameters:
  • x (list , Optional,) – names of variables in data. Inputs for plotting long-form data. See examples for interpretation. Here it can correspond to x_features , y_features and targets from dataframe. Note that each columns item could be correspond as element of x, y or hue. For instance x_features could refer to x-axis features and must be more than 0 and set into a list. the y_features might match the columns name for sns.catplot. If number of feature is more than one, create a list to hold all features is recommended. the y should fit the sns.catplot argument hue. Like other it should be on list of features are greater than one.

  • y (list , Optional,) – names of variables in data. Inputs for plotting long-form data. See examples for interpretation. Here it can correspond to x_features , y_features and targets from dataframe. Note that each columns item could be correspond as element of x, y or hue. For instance x_features could refer to x-axis features and must be more than 0 and set into a list. the y_features might match the columns name for sns.catplot. If number of feature is more than one, create a list to hold all features is recommended. the y should fit the sns.catplot argument hue. Like other it should be on list of features are greater than one.

  • hue (list , Optional,) – names of variables in data. Inputs for plotting long-form data. See examples for interpretation. Here it can correspond to x_features , y_features and targets from dataframe. Note that each columns item could be correspond as element of x, y or hue. For instance x_features could refer to x-axis features and must be more than 0 and set into a list. the y_features might match the columns name for sns.catplot. If number of feature is more than one, create a list to hold all features is recommended. the y should fit the sns.catplot argument hue. Like other it should be on list of features are greater than one.

  • row – Categorical variables that will determine the faceting of the grid.

  • data (str or pd.core.DataFrame) – Categorical variables that will determine the faceting of the grid.

  • optional – Categorical variables that will determine the faceting of the grid.

  • col_wrapint – “Wrap” the column variable at this width, so that the column facets span multiple rows. Incompatible with a row facet.

  • estimator (string or callable that maps vector -> scalar, optional) – Statistical function to estimate within each categorical bin.

  • errorbar (string, (string, number) tuple, or callable) – Name of errorbar method (either “ci”, “pi”, “se”, or “sd”), or a tuple with a method name and a level parameter, or a function that maps from a vector to a (min, max) interval.

  • n_bootint – Number of bootstrap samples used to compute confidence intervals.

  • optional – Number of bootstrap samples used to compute confidence intervals.

  • units (name of variable in data or vector data, optional) – Identifier of sampling units, which will be used to perform a multilevel bootstrap and account for repeated measures design.

  • seed (int, numpy.random.Generator, or numpy.random.RandomState, optional) – Seed or random number generator for reproducible bootstrapping.

  • order (lists of strings, optional) – Order to plot the categorical levels in; otherwise the levels are inferred from the data objects.

  • hue_order (lists of strings, optional) – Order to plot the categorical levels in; otherwise the levels are inferred from the data objects.

  • row_order (lists of strings, optional) – Order to organize the rows and/or columns of the grid in, otherwise the orders are inferred from the data objects.

  • col_order (lists of strings, optional) – Order to organize the rows and/or columns of the grid in, otherwise the orders are inferred from the data objects.

  • height (scalar) – Height (in inches) of each facet. See also: aspect.

  • aspect (scalar) – Aspect ratio of each facet, so that aspect * height gives the width of each facet in inches.

  • kind (str, optional) – `The kind of plot to draw, corresponds to the name of a categorical axes-level plotting function. Options are: “strip”, “swarm”, “box”, “violin”, “boxen”, “point”, “bar”, or “count”.

  • native_scale (bool, optional) – When True, numeric or datetime values on the categorical axis will maintain their original scaling rather than being converted to fixed indices.

  • formatter (callable, optional) – Function for converting categorical data into strings. Affects both grouping and tick labels.

  • orient ("v" | "h", optional) – Orientation of the plot (vertical or horizontal). This is usually inferred based on the type of the input variables, but it can be used to resolve ambiguity when both x and y are numeric or when plotting wide-form data.

  • color (matplotlib color, optional) – Single color for the elements in the plot.

  • palette (palette name, list, or dict) – Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors.

  • hue_norm (tuple or matplotlib.colors.Normalize object) – Normalization in data units for colormap applied to the hue variable when it is numeric. Not relevant if hue is categorical.

  • legend (str or bool, optional) – Set to False to disable the legend. With strip or swarm plots, this also accepts a string, as described in the axes-level docstrings.

  • legend_out (bool) – If True, the figure size will be extended, and the legend will be drawn outside the plot on the center right.

  • share{x (bool, 'col', or 'row' optional) – If true, the facets will share y axes across columns and/or x axes across rows.

  • y} (bool, 'col', or 'row' optional) – If true, the facets will share y axes across columns and/or x axes across rows.

  • margin_titles (bool) – If True, the titles for the row variable are drawn to the right of the last column. This option is experimental and may not work in all cases.

  • facet_kws (dict, optional) – Dictionary of other keyword arguments to pass to FacetGrid.

  • kwargs (key, value pairings) – Other keyword arguments are passed through to the underlying plotting function.

  • data – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.

Returns:

Returns self for easy method chaining.

Return type:

QuickPlot instance

Notes

The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.

Examples

>>> from watex.view.plot import QuickPlot
>>> from watex.datasets import load_bagoue
>>> data = load_bagoue ().frame
>>> qplotObj= QuickPlot(lc='b', tname='flow')
>>> qplotObj.sns_style = 'darkgrid'
>>> qplotObj.mapflow=True # to categorize the flow rate
>>> qplotObj.fit(data)
>>> fdict={
...            'x':['shape', 'type', 'type'],
...            'col':['type', 'geol', 'shape'],
...            'hue':['flow', 'flow', 'geol'],
...            }
>>> qplotObj.multicatdist(**fdict)
naiveviz(x=None, y=None, kind='scatter', s_col='lwi', leg_kws={}, **pd_kws)[source]#

Creates a plot to visualize the samples distributions according to the geographical coordinates x and y.

Parameters:
  • x (str ,) – Column name to hold the x-axis values

  • y (str,) – column na me to hold the y-axis values

  • s_col (column for scatter points. ‘Default is fs time the features) – column lwi.

  • pd_kws (dict, optional,) – Pandas plot keywords arguments

  • leg_kws (dict, kws) – Matplotlib legend keywords arguments

  • data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.

Returns:

Returns self for easy method chaining.

Return type:

QuickPlot instance

Notes

The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.

Examples

>>> from watex.transformers import StratifiedWithCategoryAdder
>>> from watex.view.plot import QuickPlot
>>> from watex.datasets import load_bagoue
>>> df = load_bagoue ().frame
>>> stratifiedNumObj= StratifiedWithCategoryAdder('flow')
>>> strat_train_set , *_=         ...    stratifiedNumObj.fit_transform(X=df)
>>> pd_kws ={'alpha': 0.4,
...         'label': 'flow m3/h',
...         'c':'flow',
...         'cmap':plt.get_cmap('jet'),
...         'colorbar':True}
>>> qkObj=QuickPlot(fs=25.)
>>> qkObj.fit(strat_train_set)
>>> qkObj.naiveviz( x= 'east', y='north', **pd_kws)
numfeatures(features=None, coerce=False, map_lower_kws=None, **sns_kws)[source]#

Plots qualitative features distribution using correlative aspect. Be sure to provide numerical features as data arguments.

Parameters:
  • features (list) – List of numerical features to plot for correlating analyses. will raise an error if features does not exist in the data

  • coerce (bool,) – Constraint the data to read all features and keep only the numerical values. An error occurs if False and the data contains some non-numericalfeatures. default is False.

  • map_lower_kws (dict, Optional) – a way to customize plot. Is a dictionnary of sns.pairplot map_lower kwargs arguments. If the diagram kind is kde, plot is customized with the provided map_lower_kws arguments. if None, will check whether the diag_kind argument on sns_kws is kde before triggering the plotting map.

  • sns_kws (dict,) – Keywords word arguments of seabon pairplots. Refer to http://seaborn.pydata.org/generated/seaborn.pairplot.html for further details.

  • data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.

Returns:

Returns self for easy method chaining.

Return type:

QuickPlot instance

Notes

The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.

Examples

>>> from watex.view.plot import QuickPlot
>>> from watex.datasets import load_bagoue
>>> data = load_bagoue ().frame
>>> qkObj = QuickPlot(mapflow =False, tname='flow'
                          ).fit(data)
>>> qkObj.sns_style ='darkgrid',
>>> qkObj.fig_title='Quantitative features correlation'
>>> sns_pkws={'aspect':2 ,
...          "height": 2,
# ...          'markers':['o', 'x', 'D', 'H', 's',
#                         '^', '+', 'S'],
...          'diag_kind':'kde',
...          'corner':False,
...          }
>>> marklow = {'level':4,
...          'color':".2"}
>>> qkObj.numfeatures(coerce=True, map_lower_kws=marklow, **sns_pkws)
scatteringfeatures(features, *, relplot_kws=None, **sns_kws)[source]#

Draw a scatter plot with possibility of several semantic features groupings.

Indeed scatteringfeatures analysis is a process of understanding how features in a dataset relate to each other and how those relationships depend on other features. Visualization can be a core component of this process because, when data are visualized properly, the human visual system can see trends and patterns that indicate a relationship.

Parameters:
  • features (list) – List of numerical features to plot for correlating analyses. will raise an error if features does not exist in the data

  • relplot_kws (dict, optional) – Extra keyword arguments to show the relationship between two features with semantic mappings of subsets. refer to <http://seaborn.pydata.org/generated/seaborn.relplot.html#seaborn.relplot> for more details.

  • sns_kwargs (dict, optional) – kwywords arguments to control what visual semantics are used to identify the different subsets. For more details, please consult <http://seaborn.pydata.org/generated/seaborn.scatterplot.html>.

  • data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.

Returns:

Returns self for easy method chaining.

Return type:

QuickPlot instance

Notes

The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.

Examples

>>> from watex.view.plot import  QuickPlot
>>> from watex.datasets import load_bagoue
>>> data = load_bagoue ().frame
>>> qkObj = QuickPlot(lc='b', sns_style ='darkgrid',
...             fig_title='geol vs lewel of water inflow',
...             xlabel='Level of water inflow (lwi)',
...             ylabel='Flow rate in m3/h'
...            )
>>>
>>> qkObj.tname='flow' # target the DC-flow rate prediction dataset
>>> qkObj.mapflow=True  # to hold category FR0, FR1 etc..
>>> qkObj.fit(data)
>>> marker_list= ['o','s','P', 'H']
>>> markers_dict = {key:mv for key, mv in zip( list (
...                       dict(qkObj.data ['geol'].value_counts(
...                           normalize=True)).keys()),
...                            marker_list)}
>>> sns_pkws={'markers':markers_dict,
...          'sizes':(20, 200),
...          "hue":'geol',
...          'style':'geol',
...         "palette":'deep',
...          'legend':'full',
...          # "hue_norm":(0,7)
...            }
>>> regpl_kws = {'col':'flow',
...             'hue':'lwi',
...             'style':'geol',
...             'kind':'scatter'
...            }
>>> qkObj.scatteringfeatures(features=['lwi', 'flow'],
...                         relplot_kws=regpl_kws,
...                         **sns_pkws,
...                    )
class watex.view.TPlot(survey_area=None, distance=50.0, prefix='S', how='py', window_size=5, component='xy', mode='same', method='slinear', out='srho', c=2, **kws)[source]#

Bases: BasePlot

Tensor plot from EM processing data.

TPlot is a Tensor (Impedances , resistivity and phases ) plot class. Explore SEG ( Society of Exploration Geophysicist ) class data. Plot recovery tensors. TPlot methods returns an instancied object that inherits from watex.property.Baseplots ABC (Abstract Base Class) for visualization.

Parameters:
  • window_size (int) – the length of the window. Must be greater than 1 and preferably an odd integer number. Default is 5

  • component (str) – field tensors direction. It can be xx, xy,``yx``, yy. If arr2d` is provided, no need to give an argument. It become useful when a collection of EDI-objects is provided. If don’t specify, the resistivity and phase value at component xy should be fetched for correction by default. Change the component value to get the appropriate data for correction. Default is xy.

  • mode (str , ['valid', 'same'], default='same') – mode of the border trimming. Should be ‘valid’ or ‘same’.’valid’ is used for regular trimimg whereas the ‘same’ is used for appending the first and last value of resistivity. Any other argument except ‘valid’ should be considered as ‘same’ argument. Default is same.

  • method (str, default slinear) – Interpolation technique to use. Can be nearest``or ``pad. Refer to the documentation of ~.interpolate2d.

  • out (str) – Value to export. Can be sfactor, tensor for corrections factor and impedance tensor. Any other values will export the static corrected resistivity srho.

  • c (int,) – A window-width expansion factor that must be input to the filter adaptation process to control the roll-off characteristics of the applied Hanning window. It is recommended to select c between 1 and 4. Default is 2.

  • distance (float) – The step between two stations/sites. If given, it creates an array of position for plotting purpose. Default value is 50 meters.

  • prefix (str) – string value to add as prefix of given id. Prefix can be the site name. Default is S.

  • how (str) – Mode to index the station. Default is ‘Python indexing’ i.e. the counting of stations would starts by 0. Any other mode will start the counting by 1.

  • savefig (str, Path-like object,) – savefigure’s name, default is None

  • fig_dpi (float,) – dots-per-inch resolution of the figure. default is 300

  • fig_num (int,) – size of figure in inches (width, height). default is [5, 5]

  • fig_size (Tuple (int, int) or inch) – size of figure in inches (width, height).*default* is [5, 5]

  • fig_orientation (str,) – figure orientation. default is landscape

  • fig_tile (str,) – figure title. default is None

  • fs (float,) – size of font of axis tick labels, axis labels are fs+2. default is 6

  • ls (str,) – line style, it can be [ ‘-’ | ‘.’ | ‘:’ ] . default is ‘-’

  • lc (str, Optional,) – line color of the plot, default is k

  • lw (float, Optional,) – line weight of the plot, default is 1.5

  • alpha (float between 0 < alpha < 1,) – transparency number, default is 0.5,

  • font_weight (str, Optional) – weight of the font , default is bold.

  • font_style (str, Optional) – style of the font. default is italic

  • font_size (float, Optional) – size of font in inches (width, height). default is 3.

  • ms (float, Optional) – size of marker in points. default is 5

  • marker (str, Optional) – marker of stations default is o.

  • marker_style (str, Optional) – facecolor of the marker. default is yellow

  • marker_edgecolor (str, Optional) – facecolor of the marker. default is yellow

  • marker_edgewidth (float, Optional) – width of the marker. default is 3.

  • xminorticks (float, Optional) – minortick according to x-axis size and default is 1.

  • yminorticks (float, Optional) – yminorticks according to x-axis size and default is 1.

  • bins (histograms element separation between two bar. default is 10.) –

  • xlim (tuple (int, int), Optional) – limit of x-axis in plot.

  • ylim (tuple (int, int), Optional) – limit of x-axis in plot.

  • xlabel (str, Optional,) – label name of x-axis in plot.

  • ylabel (str, Optional,) – label name of y-axis in plot.

  • rotate_xlabel (float, Optional) – angle to rotate xlabel in plot.

  • rotate_ylabel (float, Optional) – angle to rotate ylabel in plot.

  • leg_kws (dict, Optional) – keyword arguments of legend. default is empty dict

  • plt_kws (dict, Optional) – keyword arguments of plot. default is empty dict

  • glc (str, Optional) – line color of the grid plot, default is k

  • glw (float, Optional) – line weight of the grid plot, default is 2

  • galpha (float, Optional,) – transparency number of grid, default is 0.5

  • gaxis (str ('x', 'y', 'both')) – type of axis to hold the grid, default is both

  • gwhich (str, Optional) – kind of grid in the plot. default is major

  • tp_axis (bool,) – axis to apply the ticks params. default is both

  • tp_labelsize (str, Optional) – labelsize of ticks params. default is italic

  • tp_bottom (bool,) – position at bottom of ticks params. default is True.

  • tp_labelbottom (bool,) – put label on the bottom of the ticks. default is False

  • tp_labeltop (bool,) – put label on the top of the ticks. default is True

  • cb_orientation (str , ('vertical', 'horizontal')) – orientation of the colorbar, default is vertical

  • cb_aspect (float, Optional) – aspect of the colorbar. default is 20.

  • cb_shrink (float, Optional) – shrink size of the colorbar. default is 1.0

  • cb_pad (float,) – pad of the colorbar of plot. default is .05

  • cb_anchor (tuple (float, float)) – anchor of the colorbar. default is (0.0, 0.5)

  • cb_panchor (tuple (float, float)) – proportionality anchor of the colorbar. default is (1.0, 0.5)

  • cb_label (str, Optional) – label of the colorbar.

  • cb_spacing (str, Optional) – spacing of the colorbar. default is uniform

  • cb_drawedges (bool,) – draw edges inside of the colorbar. default is False

  • sns_orient ('v' | 'h', optional) – Orientation of the plot (vertical or horizontal). This is usually inferred based on the type of the input variables, but it can be used to resolve ambiguity when both x and y are numeric or when plotting wide-form data. default is v which refer to ‘vertical’

  • sns_style (dict, or one of {darkgrid, whitegrid, dark, white, ticks}) – A dictionary of parameters or the name of a preconfigured style.

  • sns_palette (seaborn color paltte | matplotlib colormap | hls | husl) – Palette definition. Should be something color_palette() can process. the palette generates the point with different colors

  • sns_height (float,) – Proportion of axes extent covered by each rug element. Can be negative. default is 4.

  • sns_aspect (scalar (float, int)) – Aspect ratio of each facet, so that aspect * height gives the width of each facet in inches. default is .7

Returns:

self – returns self for easy method chaining.

Return type:

Baseclass instance

Examples

>>> from watex.view.plot import TPlot
>>> from watex.datasets import load_edis
>>> plot_kws = dict( ylabel = '$Log_{10}Frequency [Hz]$',
                    xlabel = '$Distance(m)$',
                    cb_label = '$Log_{10}Rhoa[\Omega.m$]',
                    fig_size =(6, 3),
                    font_size =7.,
                    rotate_xlabel=45,
                    imshow_interp='bicubic',
                    )
>>> edi_data =load_edis (return_data= True, samples=7 )
>>> t= TPlot(**plot_kws ).fit(edi_data)
>>> t.fit(edi_data ).plot_tensor2d (to_log10=True )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|Data collected =  7      |EDI success. read=  7      |Rate     =  100.0  %|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Out[150]: <AxesSubplot:xlabel='$Distance(m)$', ylabel='$Log_{10}Frequency [Hz]$'>
fit(data)[source]#

Fit data and populate attributes.

Parameters:

data (str, or list or pycsamt.core.edi.Edi object) – Full path to EDI files or collection of EDI-objects

Returns:

``self`` – returns self for chaining methods.

Return type:

watex.view.plot.TPlot instanciated object

property inspect#

Inspect object whether is fitted or not

plotSkew(method='Bahr', view='skew', mode=None, threshold_line=None, show_average_sensistivity=True, suppress_outliers=True, **plot_kws)[source]#

Plot phase sensistive skew visualization

‘Skew’ is also knwown as the conventional asymmetry parameter based on the Z magnitude.

Mosly, the EM signal is influenced by several factors such as the dimensionality of the propagation medium and the physical anomalies, which can distort theEM field both locally and regionally. The distortion of Z was determined from the quantification of its asymmetry and the deviation from the conditions that define its dimensionality. The parameters used for this purpose are all rotational invariant because the Z components involved in its definition are independent of the orientation system used. The conventional asymmetry parameter based on the Z magnitude is the skew defined by Swift (1967) [1] and Bahr (1991) [2].

Parameters:
  • method (str, default='Bahr':) –

    Kind of correction. Can be:

    • swift for the remove distorsion proposed by Swift in 1967. The value close to 0. assume the 1D and 2D structures, and 3D otherwise. However, In general case, the electrical structure of \(\eta < 0.4\) can be treated as a 2D medium.

    • bahr for the remove distorsion proposed by Bahr in 1991. The latter threshold is set to 0.3. Above this value the structures is 3D.

  • view (str, default='skew') – phase sensistive visualization. Can be rotational invariant invariant. In fact, setting to mu or invariant does not change any interpretation when since the distortion of Z are all rotational invariant whether using the Bahr or swift methods.

  • mode (str, optional) – X-axis coordinates for visualisation. plot either 'frequency' or 'periods'. The default is 'frequency'

  • threshold_line (float, optional) –

    Visualize th threshold line. Can be [‘bahr’, ‘swift’, ‘both’]:

    • Note that when method is set to swift, the value close to close to \(0.\) assume the 1D and 2D structures (\(\eta <0.4\)), and 3D otherwise( \(\eta >0.4\)). The threshold line for swift is set to \(0.4\).

    • when method is set to Bahr, \(\eta > 0.3`\) is 3D structures, between \([0.1 - 0.3]\) assumes modified 3D/2D structures whereas \(<0.1\) 1D, 2D or distorted 2D.

  • show_average_sensistivity (bool, default=True) – Display the averaged value of skew data at all -frequencies. Value can help a dimensionality interpretation purposes.

  • suppress_outliers (bool, default=True) – Remove the outliers in the data if exists. It uses the Inter Quartile Range (IQR) approach. See the documentation of watex.utils.remove_outliers(). This is useful for clear interpretation using the skew threshold value.

See also

watex.methods.Processing.skew

For mathematical skew Bahr and Swift concept formulations.

watex.utils.plot_skew

For phase sensistive skew visualization - naive plot.

Examples

>>> import watex
>>> test_data = watex.fetch_data ('edis', samples =37, return_data =True )
>>> watex.TPlot(fig_size =(10,  4), marker ='x').fit(
    test_data).plotSkew(method ='swift', threshold_line=True)

References

[1]

Swift, C., 1967. A magnetotelluric investigation of an electrical conductivity anomaly in the southwestern United States. Ph.D. Thesis, MIT Press. Cambridge.

[2]

Bahr, K., 1991. Geological noise in magnetotelluric data: a classification of distortion types. Physics of the Earth and Planetary Interiors 66 (1–2), 24–38.

plot_corrections(fltr='ama', ss_fx=None, ss_fy=None, r=1000.0, nfreq=21, skipfreq=5, tol=0.12, rotate=0.0, distortion=None, distortion_err=None, mode='TE', scale='period', sites=None, seed=None, how='py', show_site=True, survey=None, style=None, errorbar=True, spad=0.5, n_sites=1, mcolors=None, markers=None, **kws)[source]#

Plot apparent resistivity/phase curves and corrections.

Changed in version 0.2.1: Can henceforth display multiple sites by providing the sites as a collection.

Parameters:
  • fltr (str , default='ama') –

    Type of filter to apply. ss is used to remove the static shift using spatial median filter. Whereas dist is for distorsion removal. Note that distortion might be provided otherwise an error raises. Can also be [‘tma’|’ama’|’flma’] for EMAP filters.

    • tma for trimming moving-average

    • ama for adaptative moving-average

    • flma for fixed-length moving-average

  • distortion_tensor (np.ndarray(2, 2, dtype=real)) – Real distortion tensor as a 2x2

  • error (np.ndarray(2, 2, dtype=real), Optional) – Propagation of errors/uncertainties included

  • ss_fx (float, Optional) – static shift factor to be applied to x components (ie z[:, 0, :]). This is assumed to be in resistivity scale. If None should be automatically computed using the spatial median filter.

  • ss_fy (float, optional) – static shift factor to be applied to y components (ie z[:, 1, :]). This is assumed to be in resistivity scale. If None , should be computed using the spatial filter median.

  • r (float, default=1000.) – radius to look for nearby stations, in meters.

  • nfreq (int, default=21) – number of frequencies calculate the median static shift. This is assuming the first frequency is the highest frequency. Cause usually highest frequencies are sampling a 1D earth.

  • skipfreq (int, default=5) – number of frequencies to skip from the highest frequency. Sometimes the highest frequencies are not reliable due to noise or low signal in the AMT deadband. This allows you to skip those frequencies.

  • tol (float, default=0.12) – Tolerance on the median static shift correction. If the data is noisy the correction factor can be biased away from 1. Therefore the shift_tol is used to stop that bias. If 1-tol < correction < 1+tol then the correction factor is set to 1

  • rotate (float, default=0.) – Rotate Z array by angle alpha in degrees. All angles are referenced to geographic North, positive in clockwise direction. (Mathematically negative!). In non-rotated state, X refs to North and Y to East direction.

  • mode (str, default='TE',) – Electromagnetic mode. Can be [‘TM’ |’both’]. If both, components xy and yx are expected in the data.

  • scale (str, default='period') – Visualization on axis labell. can be 'frequency'.

  • sites (int,str, optional) – index of name of the site to plot. site must be composed of a position number. For instance 'S13'. If not provided, a random station is selected instead.

  • seed (int, optional) – Get the same site if site is not provided. seed fetches a random number of site. T

  • how (str, default='py') – The way the site is fetched for plot. For instance, in Python indexing (default), the site is numbered from 0. For instance ‘site05’ will fetch the data at index 4. If this positioning is not wished, set to ‘None’.

  • show_site (bool, default=True,) – Display the number of site.

  • survey (str, optional) – Method used for the survey. e.g., ‘AMT’ for Audio-Magnetotellurics.

  • style (str, default='default') – Matplotlib style.

  • errorbar (bool, default=True) – display the error bar.

  • spad (float, default=.5,) –

    pad to display the station in the top of each section plot.

    New in version 0.2.1.

  • n_sites (int, default =1.) – Number of random sites to select for visualizing. It cannot work if the names of sites are given.

  • mcolors (str, list, optional) – The list of colors for resistivy and phase.

markersstr, list, optional

The list of marker for resistivy and phase.

markers = None,

kws: dict,

Addfitional keywords arguments passed to Matplotlib.Axes.Scatter plots.

Examples

>>> import numpy as np
>>> import watex as wx
>>> edi_data = wx.fetch_data ('edis', return_data =True, samples =27)
>>> wx.TPlot(show_grid=True).fit(edi_data).plot_corrections (
    seed =52, )
>>> distortion = np.array([[1.1 , 0.6 ],[0.23, 1.9 ]])
>>> wx.TPlot(show_grid=True).fit(edi_data).plot_corrections (
     seed =52, mode ='tm', fltr ='dist', distortion =distortion
     )
plot_ctensor2d(tensor='res', ffilter='tma', sites=None, to_log10=False)[source]#

Plot filtered tensors

Parameters:
  • tensor (str , ['res','phase', 'z'], default='res') – kind of tensor to plot. Can be resistivity or phase. If phase, customize your plot to not fit the default ‘res’ behaviour.

  • ffilter (str ['ama', 'flma', 'tma'], default='tma') – kind of appropriate filter to corrected tensor data.

  • to_log10 (bool, defaut=False,) – Convert the resistivity data and frequeny in log10.

  • sites (list of str, optional) – List of stations/sites names. If given, it must have the same length of the positions in of the EDI data. Must fit the number of ‘EDI’ succesffully read.

Returns:

  • arr2d: 2D filtered tensor array from the component

  • freqs: array-like 1d of frequency in the survey.

  • positions: Sites/stations positions. It is equals to the distance

    between stations times the number of sites

  • sites: list of the names of the station/sites

  • base_plot_kws: plot keywords arguments inherits from

    watex.property.BasePlot. It composes the last parameters for customizing plot as decorated return function.

Return type:

( arr2d , freqs, positions , sites , base_plot_kws)

Examples

>>> from watex.view.plot import TPlot
>>> from watex.datasets import load_edis
>>> # get some 3 samples of EDI for demo
>>> edi_data = load_edis (return_data =True, samples =3 )
>>> # customize plot by adding plot_kws
>>> plot_kws = dict( ylabel = '$Log_{10}Frequency [Hz]$',
                    xlabel = '$Distance(m)$',
                    cb_label = '$Log_{10}Rhoa[\Omega.m$]',
                    fig_size =(6, 3),
                    font_size =7.
                    )
>>> t= TPlot(**plot_kws ).fit(edi_data)
>>> # plot filtered tensor using the log10 resistivity
>>> t.plot_ctensor2d (to_log10=True)
<AxesSubplot:xlabel='$Distance(m)$', ylabel='$Log_{10}Frequency [Hz]$'>
plot_multi_recovery(sites, colors=None, **kws)[source]#

Plots mutiple site/stations with signal recovery.

Parameters:
  • sites (list) – list of sites to visualize. Can also be the index of the sites

  • colors (list of str) – matplotlib colors to customize the raw signal and recovery signal

Returns:

ax

Return type:

Matplotlib suplot axes

Examples

>>> from watex.view.plot import TPlot
>>> from watex.datasets import load_edis
>>> # takes the 03 samples of EDIs
>>> edi_data = load_edis (return_data= True, samples =3 )
>>> TPlot(fig_size =(5, 3)).fit(edi_data).plot_multi_recovery (
    sites =['S00'], colors =['o', 'ok--'])
<AxesSubplot:title={'center':'Recovered tensor $|Z_{xy}|$'},
xlabel='$Frequency [H_z]$', ylabel='$ App.resistivity \quad xy \quad [ \Omega.m]$'>
plot_phase_tensors(mode='frequency', stretch=(7000, 20), linedir='ns', tensor='phimin', ellipse_dict=None, **kws)[source]#

Plot phase tensor pseudosection and skew ellipsis visualization.

Method plots the phase tensor ellipses in a pseudo section format. It uses mtpy as dependency.

Parameters:
  • mode (str, default ='frequency') – Tempoora scale in y-axis. Can be [‘frequency’ | ‘period’]

  • stretch (float or tuple (xstretch, ystretch), default=200) – Is a factor that scales the distance from one station to the next to make the plot readable. It determines (x,y) aspect ratio of plot.

  • linedir (str [ 'ns' | 'ew' ], default='ns') –

    The predominant direction of profile line. It can be [‘ns’ | ‘ew’] where:

    • ’ns’ refer to North-South Line or line is closer to north-south)

    • ’ew’ refer to East-West line or line is closer to east-west

    Default is ‘ns’

  • tensor (str, default='phimin') –

    Is the tensor skew or ellipsis visualizations. The color for plot

    style is referred accordingly. Tensor can be:

    [ ‘phimin’ | ‘phimax’ | ‘skew’ |’skew_seg’ | ‘phidet’ |’ellipticity’ ]

    where:

    • ’phimin’ -> colors by minimum phase

    • ’phimax’ -> colors by maximum phase

    • ’skew’ -> colors by skew

    • ’skew_seg’ -> colors by skew indiscrete segments defined

      by the range

    • ’normalized_skew’ -> colors by skew see [Booker, 2014]

    • ’normalized_skew_seg’ -> colors by normalized skew in

      discrete segments defined by the range

    • ’phidet’ -> colors by determinant of the phase tensor

    • ’ellipticity’ -> colors by ellipticity default is ‘phimin’

  • ellipse_dict (dict, optional) –

    Dictionary of parameters for the phase tensor ellipses with keys:

    • ’size’: float, default =2 , is the size of ellipse in points

    • ’colorby’str, default=’phimin’

      Is the color for plot style referring either to tensor, skew or ellipsis visualizations. It can be all the tensor parameter values. see tensor parameter values. [ ‘phimin’ | ‘phimax’ | ‘skew’ |’skew_seg’ | ‘phidet’ |’ellipticity’ ]

    • ’range’tuple (min, max, step), default=’colorby’

      Need to input at least the min and max and if using ‘skew_seg’ to plot discrete values input step as well

    • ’cmap’[ ‘mt_yl2rd’ | ‘mt_bl2yl2rd’ |’mt_wh2bl’ | ‘mt_rd2bl’ |

      ’mt_bl2wh2rd’ | ‘mt_seg_bl2wh2rd’ |’mt_rd2gr2bl’ ]

      • ’mt_yl2rd’ -> yellow to red

      • ’mt_bl2yl2rd’ -> blue to yellow to red

      • ’mt_wh2bl’ -> white to blue

      • ’mt_rd2bl’ -> red to blue

      • ’mt_bl2wh2rd’ -> blue to white to red

      • ’mt_bl2gr2rd’ -> blue to green to red

      • ’mt_rd2gr2bl’ -> red to green to blue

      • ’mt_seg_bl2wh2rd’ -> discrete blue to white to red

  • kws (dict) – Additional keywords arguments passed from |MTpy| pseudosection phase tensor class: PlotPhaseTensorPseudoSection

See also

mtpy.imaging.phase_tensor_pseudosection.PlotPhaseTensorPseudoSection

PlotPhase pseudo section tensor from |MTpy| package.

watex.utils.plot_skew

Phase sensitive skew visualization.

Examples

>>> import watex as wx
>>> edi_data = wx.fetch_data ('edis', key='edi', return_data =True , samples =17 )
>>> tplot = wx.TPlot ().fit(edi_data )
>>> tplot.plot_phase_tensors (tensor ='skew')
plot_recovery(site='S00')[source]#

visualize the restored tensor per site.

Parameters:

site (str, int, default ="S00") – Site/station name for

Returns:

``self`` – returns self for chaining methods.

Return type:

watex.view.plot.TPlot instanciated object

Examples

>>> from watex.view import TPlot
>>> from watex.datasets import load_edis
>>> edi_data = load_edis (return_data =True, samples =7)
>>> plot_kws = dict( ylabel = '$Log_{10}Frequency [Hz]$',
            xlabel = '$Distance(m)$',
            cb_label = '$Log_{10}Rhoa[\Omega.m$]',
            fig_size =(7, 4),
            font_size =7.
            )
>>> t= TPlot(**plot_kws ).fit(edi_data)
>>> # plot recovery of site 'S01'
>>> t.plot_recovery ('S01')
plot_rhoa(mode='TE', scale='period', site=None, seed=None, how='py', show_site=True, survey=None, style=None, errorbar=True, suppress_outliers=False, **kws)[source]#

Plot apparent resistivity and phase curves

Parameters:
  • mode (str, default='TE',) – Electromagnetic mode. Can be [‘TM’ |’both’]. If both, components xy and yx are expected in the data.

  • scale (str, default='period') – Visualization on axis labell. can be 'frequency'.

  • site (int,str, optional) – index of name of the site to plot. site must be composed of a position number. For instance 'S13'. If not provided, a random station is selected instead.

  • seed (int, optional) – If site is not provided, seed fetches randomly a site. To fetch the same sime everytimes, it is better to set the seed value.

  • how (str, default='py') – The way the site is fetched for plot. For instance, in Python indexing (default), the site is numbered from 0. For instance ‘site05’ will fetch the data at index 4. If this positioning is not wished, set to ‘None’.

  • show_site (bool, default=True,) – Display the number of site.

  • survey (str, optional) – Method used for the survey. e.g., ‘AMT’ for Audio-Magnetotellurics.

  • style (str, default='default') – Matplotlib style.

  • errorbar (bool, default=True) – display the error bar.

  • suppress_outliers (bool, default=False,) – Remove outliers in the data before plotting

  • kws (dict,) – Addfitional keywords arguments passed to Matplotlib.Axes.Scatter plots.

Examples

>>> import watex as wx
>>> edi_data = wx.fetch_data ('edis', return_data =True, samples =27)
>>> wx.TPlot(show_grid=True).fit(edi_data).plot_rhoa (
    seed =52, mode ='*')
plot_rhophi(sites=None, mode='TE', scale='period', seed=None, how='py', show_site=True, survey=None, style=None, errorbar=True, suppress_outliers=False, n_sites=1, spad=0.5, **kws)[source]#

Plot resistivities and phases from multiples stations.

Parameters:
  • mode (str, default='TE',) – Electromagnetic mode. Can be [‘TM’ |’both’]. If both, components xy and yx are expected in the data.

  • sites (int,str, or list, optional) – A collection of index of name of the site . Each site must be composed of a position number. For instance 'S13'. If not provided, a random sites are selected instead using the n_sites parameter.

  • scale (str, default='period') – Visualization on axis labell. can be 'frequency'.

  • seed (int, optional) – If site is not provided, seed fetches randomly a site. To fetch the same sime everytimes, it is better to set the seed value.

  • how (str, default='py') – The way the site is fetched for plot. For instance, in Python indexing (default), the site is numbered from 0. For instance ‘site05’ will fetch the data at index 4. If this positioning is not wished, set to ‘None’.

  • show_site (bool, default=True,) – Display the number of site.

  • survey (str, optional) – Method used for the survey. e.g., ‘AMT’ for Audio-Magnetotellurics.

  • style (str, default='default') – Matplotlib style.

  • errorbar (bool, default=True) – display the error bar.

  • suppress_outliers (bool, default=False,) – Remove outliers in the data before plotting

  • n_sites (int, default =1.) – Number of random sites to select for visualizing. It cannot work if the names of sites are given.

  • spad (float, default=.5,) –

    pad to display the station in the top of each section plot.

    New in version 0.2.1.

  • kws (dict,) – Addfitional keywords arguments passed to Matplotlib.Axes.Scatter plots.

Examples

>>> import watex as wx
>>> edi_data = wx.fetch_data ('edis', return_data =True, samples =27)
>>> wx.TPlot(show_grid=True).fit(edi_data).plot_rhophi (
    seed =52, mode ='*', n_sites =3 )
plot_tensor2d(tensor='res', sites=None, to_log10=False)[source]#

Plot two dimensional tensor.

Parameters:
  • freqs (array-like) – y-coordinates. It should have the length N, the same of the arr2d. the rows of the arr2d.Frequency array. It should be the complete frequency used during the survey area.

  • tensor (str , ['res','phase', 'z'], default='res') – kind of tensor to plot. Can be resistivity or phase. If phase, customize your plot to not fit the default ‘res’ behaviour.

  • to_log10 (bool, defaut=False,) – Convert the resistivity data and frequeny in log10.

  • sites (list of str, optional) – List of stations/sites names. If given, it must have the same length of the positions in of the EDI data. Must fit the number of ‘EDI’ succesffully read.

Returns:

  • arr2d: 2D resistivity array from the tensor component

  • freqs: array-like 1d of frequency in the survey.

  • positions: Sites/stations positions. It is equals to the distance

    between stations times the number of sites

  • sites: list of the names of the station/sites

  • base_plot_kws: plot keywords arguments inherits from

    watex.property.BasePlot. It composes the last parameters for customizing plot as decorated return function.

Return type:

( arr2d , freqs, positions , sites , base_plot_kws)

Examples

>>> from watex.view.plot import TPlot
>>> from watex.datasets import load_edis
>>> # get some 3 samples of EDI for demo
>>> edi_data = load_edis (return_data =True, samples =3 )
>>> # customize plot by adding plot_kws
>>> plot_kws = dict( ylabel = '$Log_{10}Frequency [Hz]$',
                    xlabel = '$Distance(m)$',
                    cb_label = '$Log_{10}Rhoa[\Omega.m$]',
                    fig_size =(6, 3),
                    font_size =7.
                    )
>>> t= TPlot(**plot_kws ).fit(edi_data)
>>> # plot recovery2d using the log10 resistivity
>>> t.plot_tensor2d (to_log10=True)
<AxesSubplot:xlabel='$Distance(m)$', ylabel='$Log_{10}Frequency [Hz]$'>
watex.view.biPlot(self, Xr, components, y, classes=None, markers=None, colors=None)[source]#

The biplot is the best way to visualize all-in-one following a PCA analysis.

There is an implementation in R but there is no standard implementation in Python.

Parameters:
  • self (watex.property.BasePlot.) –

    Matplotlib property from BasePlot instances. Default BasePlot instance is given as a pobj instance and can be loaded for plotting purpose as:

    >>> from watex.view import pobj
    

    To change some default plot properties like line width or style, both can be set before running the script as follow

    >>> pobj.lw = 2. ; pobj.ls=':' # and so on
    

  • Xr (NDArray of transformed X.) – the PCA projected data scores on n-given components.The reduced dimension of train set ‘X’ with maximum ratio as sorted eigenvectors from first to the last component.

  • components (NDArray, shape (n_components, n_eigenvectors ),) – the eigenvectors of the PCA. The shape in axis must much the number of component computed using PCA. If the Xr shape 1 equals to the shape 0 of the component matrix components, it will be transposed to fit Xr shape 1.

  • y (Array-like,) – the target composing the class labels.

  • classes (list or int,) – class categories or class labels

  • markers (str,) – Matplotlib list of markers for plotting classes.

  • colors (str,) – Matplotlib list of colors to customize plots

Examples

>>> from watex.analysis import nPCA
>>> from watex.datasets import fetch_data
>>> from watex.view import biPlot, pobj  # pobj is Baseplot instance
>>> X, y = fetch_data ('bagoue pca' )  # fetch pca data
>>> pca= nPCA (X, n_components= 2 , return_X= False ) # return PCA object
>>> components = pca.components_ [:2, :] # for two components
>>> biPlot (pobj, pca.X, components , y ) # pca.X is the reduced dim X
>>> # to change for instance line width (lw) or style (ls)
>>> # just use the baseplotobject (pobj)

References

Originally written by Serafeim Loukas, serafeim.loukas@epfl.ch and was edited to fit the watex package API.

watex.view.plot2d(ar, y=None, x=None, distance=50.0, stnlist=None, prefix='S', how='py', to_log10=False, plot_contours=False, top_label='', **baseplot_kws)[source]#

Two dimensional template for visualization matrices.

It is a wrappers that can plot any matrice by customizing the position X and y. By default X is considering as stations and y the resistivity log data.

Parameters:
  • ar (Array-like 2D, shape (M, N)) – 2D array for plotting. For instance, it can be a 2D resistivity collected at all stations (N) and all frequency (M)

  • y (array-like, default=None) – Y-coordinates. It should have the length N, the same of the arr2d. the rows of the arr2d.

  • x (array-like, default=None,) – X-coordinates. It should have the length M, the same of the arr2d; the columns of the 2D dimensional array. Note that if x is given, the `distance is not needed.

  • distance (float) – The step between two stations. If given, it creates an array of position for plotting purpose. Default value is 50 meters.

  • stnlist (list of str) – List of stations names. If given, it should have the same length of the columns M, of arr2d`

  • prefix (str) – string value to add as prefix of given id. Prefix can be the site name. Default is S.

  • how (str) – Mode to index the station. Default is ‘Python indexing’ i.e. the counting of stations would starts by 0. Any other mode will start the counting by 1.

  • to_log10 (bool, default=False) – Recompute the ar in logarithm base 10 values. Note when True, the y should be also in log10.

  • plot_contours (bool, default=True) – Plot the contours map. Is available only if the plot_style is set to pcolormesh.

  • baseplot_kws (dict,) – All all the keywords arguments passed to the property watex.property.BasePlot class.

Returns:

axe

Return type:

<AxesSubplot> object

Examples

>>> import numpy as np
>>> import watex
>>> np.random.seed (42)
>>> data = np.random.randn ( 15, 20 )
>>> data_nan = data.copy()
>>> data_nan [2, 1] = np.nan; data_nan[4, 2]= np.nan;  data_nan[6, 3]=np.nan
>>> watex.view.mlplot.plot2d (data )
<AxesSubplot:xlabel='Distance(m)', ylabel='log10(Frequency)[Hz]'>
>>> watex.view.mlplot.plot2d (data_nan ,  plt_style = 'imshow',
                              fig_size = (10, 4))
watex.view.plotDendrogram(df, columns=None, labels=None, metric='euclidean', method='complete', kind=None, return_r=False, verbose=False, **kwd)[source]#

Visualizes the linkage matrix in the results of dendrogram.

Note that the categorical features if exist in the dataframe should automatically be discarded.

Parameters:
  • df (dataframe or NDArray of (n_samples, n_features)) – dataframe of Ndarray. If array is given , must specify the column names to much the array shape 1

  • columns (list) – list of labels to name each columns of arrays of (n_samples, n_features) If dataframe is given, don’t need to specify the columns.

  • kind (str, ['squareform'|'condense'|'design'], default is {'design'}) – kind of approach to summing up the linkage matrix. Indeed, a condensed distance matrix is a flat array containing the upper triangular of the distance matrix. This is the form that pdist returns. Alternatively, a collection of \(m\) observation vectors in \(n\) dimensions may be passed as an \(m\) by \(n\) array. All elements of the condensed distance matrix must be finite, i.e., no NaNs or infs. Alternatively, we could used the squareform distance matrix to yield different distance values than expected. the design approach uses the complete inpout example matrix also called ‘design matrix’ to lead correct linkage matrix similar to squareform and condense`.

  • metric (str or callable, default is {'euclidean'}) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by sklearn.metrics.pairwise.pairwise_distances(). If X is the distance array itself, use “precomputed” as the metric. Precomputed distance matrices must have 0 along the diagonal.

  • method (str, optional, default is {'complete'}) – The linkage algorithm to use. See the Linkage Methods section below for full descriptions in watex.utils.exmath.linkage_matrix()

  • labels (ndarray, optional) – By default, labels is None so the index of the original observation is used to label the leaf nodes. Otherwise, this is an \(n\)-sized sequence, with n == Z.shape[0] + 1. The labels[i] value is the text to put under the \(i\) th leaf node only if it corresponds to an original observation and not a non-singleton cluster.

  • return_r (bool, default='False',) – return r-dictionnary if set to ‘True’ otherwise returns nothing

  • verbose (int, bool, default='False') – If True, output message of the name of categorical features dropped.

  • kwd (dict) – additional keywords arguments passes to scipy.cluster.hierarchy.dendrogram()

Returns:

r – A dictionary of data structures computed to render the dendrogram. Its has the following keys:

'color_list'

A list of color names. The k’th element represents the color of the k’th link.

'icoord' and 'dcoord'

Each of them is a list of lists. Let icoord = [I1, I2, ..., Ip] where Ik = [xk1, xk2, xk3, xk4] and dcoord = [D1, D2, ..., Dp] where Dk = [yk1, yk2, yk3, yk4], then the k’th link painted is (xk1, yk1) - (xk2, yk2) - (xk3, yk3) - (xk4, yk4).

'ivl'

A list of labels corresponding to the leaf nodes.

'leaves'

For each i, H[i] == j, cluster node j appears in position i in the left-to-right traversal of the leaves, where \(j < 2n-1\) and \(i < n\). If j is less than n, the i-th leaf node corresponds to an original observation. Otherwise, it corresponds to a non-singleton cluster.

'leaves_color_list'

A list of color names. The k’th element represents the color of the k’th leaf.

Return type:

dict

Examples

>>> from watex.datasets import load_iris
>>> from watex.view import plotDendrogram
>>> data = load_iris ()
>>> X =data.data[:, :2]
>>> plotDendrogram (X, columns =['X1', 'X2' ] )
watex.view.plotDendroheat(df, columns=None, labels=None, metric='euclidean', method='complete', kind='design', cmap='hot_r', fig_size=(8, 8), facecolor='white', **kwd)[source]#

Attaches dendrogram to a heat map.

Hierachical dendrogram are often used in combination with a heat map which allows us to represent the individual value in data array or matrix containing our training examples with a color code.

Parameters:
  • df (dataframe or NDArray of (n_samples, n_features)) – dataframe of Ndarray. If array is given , must specify the column names to much the array shape 1

  • columns (list) – list of labels to name each columns of arrays of (n_samples, n_features) If dataframe is given, don’t need to specify the columns.

  • kind (str, ['squareform'|'condense'|'design'], default is {'design'}) – kind of approach to summing up the linkage matrix. Indeed, a condensed distance matrix is a flat array containing the upper triangular of the distance matrix. This is the form that pdist returns. Alternatively, a collection of \(m\) observation vectors in \(n\) dimensions may be passed as an \(m\) by \(n\) array. All elements of the condensed distance matrix must be finite, i.e., no NaNs or infs. Alternatively, we could used the squareform distance matrix to yield different distance values than expected. the design approach uses the complete inpout example matrix also called ‘design matrix’ to lead correct linkage matrix similar to squareform and condense`.

  • metric (str or callable, default is {'euclidean'}) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by sklearn.metrics.pairwise.pairwise_distances(). If X is the distance array itself, use “precomputed” as the metric. Precomputed distance matrices must have 0 along the diagonal.

  • method (str, optional, default is {'complete'}) – The linkage algorithm to use. See the Linkage Methods section below for full descriptions in watex.utils.exmath.linkage_matrix()

  • labels (ndarray, optional) – By default, labels is None so the index of the original observation is used to label the leaf nodes. Otherwise, this is an \(n\)-sized sequence, with n == Z.shape[0] + 1. The labels[i] value is the text to put under the \(i\) th leaf node only if it corresponds to an original observation and not a non-singleton cluster.

  • cmap (str , default is {'hot_r'}) – matplotlib color map

  • fig_size (str , Tuple , default is {(8, 8)}) – the size of the figure

  • facecolor (str , default is {"white"}) – Matplotlib facecolor

  • kwd (dict) – additional keywords arguments passes to scipy.cluster.hierarchy.dendrogram()

Examples

>>> # (1) -> Use random data
>>> import numpy as np
>>> from watex.view.mlplot import plotDendroheat
>>> np.random.seed(123)
>>> variables =['X', 'Y', 'Z'] ; labels =['ID_0', 'ID_1', 'ID_2',
                                         'ID_3', 'ID_4']
>>> X= np.random.random_sample ([5,3]) *10
>>> df =pd.DataFrame (X, columns =variables, index =labels)
>>> plotDendroheat (df)
>>> # (2) -> Use Bagoue data
>>> from watex.datasets import load_bagoue
>>> X, y = load_bagoue (as_frame=True )
>>> X =X[['magnitude', 'power', 'sfi']].astype(float) # convert to float
>>> plotDendroheat (X )
watex.view.plotLearningInspection(model, X, y, axes=None, ylim=None, cv=5, n_jobs=None, train_sizes=None, display_legend=True, title=None)[source]#

Inspect model from its learning curve.

Generate 3 plots: the test and training learning curve, the training samples vs fit times curve, the fit times vs score curve.

Parameters:
  • model (estimator instance) – An estimator instance implementing fit and predict methods which will be cloned for each validation.

  • title (str) – Title for the chart.

  • X (array-like of shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like of shape (n_samples) or (n_samples, n_features)) – Target relative to X for classification or regression; None for unsupervised learning.

  • axes (array-like of shape (3,), default=None) – Axes to use for plotting the curves.

  • ylim (tuple of shape (2,), default=None) – Defines minimum and maximum y-values plotted, e.g. (ymin, ymax).

  • cv (int, cross-validation generator or an iterable, default=None) –

    Determines the cross-validation splitting strategy. Possible inputs for cv are:

    • None, to use the default 5-fold cross-validation,

    • integer, to specify the number of folds.

    • CV splitter,

    • An iterable yielding (train, test) splits as arrays of indices.

    For integer/None inputs, if y is binary or multiclass, StratifiedKFold used. If the estimator is not a classifier or if y is neither binary nor multiclass, KFold is used.

    Refer User Guide for the various cross-validators that can be used here.

  • n_jobs (int or None, default=None) – Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

  • train_sizes (array-like of shape (n_ticks,)) – Relative or absolute numbers of training examples that will be used to generate the learning curve. If the dtype is float, it is regarded as a fraction of the maximum size of the training set (that is determined by the selected validation method), i.e. it has to be within (0, 1]. Otherwise it is interpreted as absolute sizes of the training sets. Note that for classification the number of samples usually have to be big enough to contain at least one sample from each class. (default: np.linspace(0.1, 1.0, 5))

  • display_legend (bool, default ='True') – display the legend

Returns:

axes

Return type:

Matplotlib axes

Examples

>>> from watex.datasets import fetch_data
>>> from watex.models import p
>>> from watex.view.mlplot import plotLearningInspection
>>> # import sparse  matrix from Bagoue datasets
>>> X, y = fetch_data ('bagoue prepared')
>>> # import the  pretrained Radial Basis Function (RBF) from SVM
>>> plotLearningInspection (p.SVM.rbf.best_estimator_  , X, y )
watex.view.plotLearningInspections(models, X, y, fig_size=(22, 18), cv=None, savefig=None, titles=None, subplot_kws=None, **kws)[source]#

Inspect multiple models from their learning curves.

Mutiples Inspection plots that generate the test and training learning curve, the training samples vs fit times curve, the fit times vs score curve for each model.

Parameters:
  • models (list of estimator instances) – Each estimator instance implements fit and predict methods which will be cloned for each validation.

  • X (array-like of shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like of shape (n_samples) or (n_samples, n_features)) – Target relative to X for classification or regression; None for unsupervised learning.

  • cv (int, cross-validation generator or an iterable, default=None) –

    Determines the cross-validation splitting strategy. Possible inputs for cv are:

    • None, to use the default 5-fold cross-validation,

    • integer, to specify the number of folds.

    • CV splitter,

    • An iterable yielding (train, test) splits as arrays of indices.

    For integer/None inputs, if y is binary or multiclass, StratifiedKFold used. If the estimator is not a classifier or if y is neither binary nor multiclass, KFold is used.

    Refer Sckikit-learn User Guide for the various cross-validators that can be used here.

  • savefig (str, default =None ,) – the path to save the figures. Argument is passed to matplotlib.Figure class.

  • titles (str, list) – List of model names if changes are needed. If None, model names are used by default.

  • kws (dict,) – Additional keywords argument passed to plotLearningInspection().

Returns:

axes

Return type:

Matplotlib axes

See also

plotLearningInspection

Inspect single model

Examples

>>> from watex.datasets import fetch_data
>>> from watex.models.premodels import p
>>> from watex.view.mlplot import plotLearningInspections
>>> # import sparse  matrix from Bagoue dataset
>>> X, y = fetch_data ('bagoue prepared')
>>> # import the two pretrained models from SVM
>>> models = [p.SVM.rbf.best_estimator_ , p.SVM.poly.best_estimator_]
>>> plotLearningInspections (models , X, y, ylim=(0.7, 1.01) )
watex.view.plotModel(yt, ypred=None, *, clf=None, Xt=None, predict=False, prefix=None, index=None, fill_between=False, labels=None, return_ypred=False, **baseplot_kws)[source]#
Plot model ‘y’ (true labels) versus ‘ypred’ (predicted) from test

data.

Plot will allow to know where estimator/classifier fails to predict correctly the target

Parameters:
yt:array-like, shape (M, ) ``M=m-samples``,

test target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

ypred:array-like, shape (M, ) ``M=m-samples``

Array of the predicted labels. It has the same number of samples as the test data ‘Xt’

clf :callable, always as a function, classifier estimator

A supervised predictor with a finite set of discrete possible output values. A classifier must supports modeling some of binary, targets. It must store a classes attribute after fitting.

Xt: Ndarray ( M x N matrix where ``M=m-samples``, & ``N=n-features``)

Shorthand for “test set”; data that is observed at testing and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix.

prefix: str, optional

litteral string to prefix the samples/examples considered as tick labels in the abscissa. For instance:

index =[0, 2, 4, 7]
prefix ='b' --> index =['b0', 'b2', 'b4', 'b7']
predict: bool, default=False,

Expected to be ‘True’ when user want to predict the array ‘ypred’ and plot at the same time. Otherwise, can be set to ‘False’ and use the’ypred’ data already predicted. Note that, if ‘True’, an estimator/classifier must be provided as well as the test data ‘Xt’, otherwise an error will occur.

index: array_like, optional

list integer values or string expected to be the index of ‘Xt’ and ‘yt’ turned into pandas dataframe and series respectively. Note that one of them has already and index and new index is given, the latter must be consistent. This is usefull when data are provided as ndarray rathern than a dataframe.

fill_between: bool

Fill a line between the actual classes i.e the true labels.

labels: list of str or int, Optional

list of labels names to hold the name of each category.

return_pred: bool,

return predicted ‘ypred’ if ‘True’ else nothing.

baseplot_kws: dict,

All all the keywords arguments passed to the peroperty watex.property.BasePlot class.

(2)-> prepared our demo estimator and plot model predicted

>>> svc_clf = SVC(C=100, gamma=1e-2, kernel='rbf', random_state =42)
>>> base_plot_params ={
                    'lw' :3.,                  # line width
                    'lc':(.9, 0, .8),
                    'ms':7.,
                    'yp_marker' :'o',
                    'fig_size':(12, 8),
                    'font_size':15.,
                    'xlabel': 'Test examples',
                    'ylabel':'Flow categories' ,
                    'marker':'o',
                    'markeredgecolor':'k',
                    'markerfacecolor':'b',
                    'markeredgewidth':3,
                    'yp_markerfacecolor' :'k',
                    'yp_markeredgecolor':'r',
                    'alpha' :1.,
                    'yp_markeredgewidth':2.,
                    'show_grid' :True,
                    'galpha' :0.2,
                    'glw':.5,
                    'rotate_xlabel' :90.,
                    'fs' :3.,
                    's' :20 ,
                    'rotate_xlabel':90
               }
>>> plotModel(yt= ytest ,
               Xt=Xtest ,
               predict =True , # predict the result (estimator fit)
               clf=svc_clf ,
               fill_between= False,
               prefix ='b',
               labels=['FR0', 'FR1', 'FR2', 'FR3'], # replace 'y' labels.
               **base_plot_params
               )
>>> # plot show where the model failed to predict the target 'yt'
watex.view.plotProjection(X, Xt=None, *, columns=None, test_kws=None, **baseplot_kws)[source]#

Visualize train and test dataset based on the geographical coordinates.

Since there is geographical information(latitude/longitude or easting/northing), it is a good idea to create a scatterplot of all instances to visualize data.

Parameters:
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.

  • Xt (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Shorthand for “test set”; data that is observed at testing and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix.

  • columns (list of str or index, optional) – columns is usefull when a dataframe is given with a dimension size greater than 2. If such data is passed to X or Xt, columns must hold the name to considered as ‘easting’, ‘northing’ when UTM coordinates are given or ‘latitude’ , ‘longitude’ when latlon are given. If dimension size is greater than 2 and columns is None , an error will raises to prevent the user to provide the index for ‘y’ and ‘x’ coordinated retrieval.

  • test_kws (dict,) – keywords arguments passed to matplotlib.plot.scatter() as test location font and colors properties.

  • baseplot_kws (dict,) – All all the keywords arguments passed to the peroperty watex.property.BasePlot class.

Examples

>>> from watex.datasets import fetch_data
>>> from watex.view.mlplot import plotProjection
>>> # Discard all the non-numeric data
>>> # then inut numerical data
>>> from watex.utils import to_numeric_dtypes, naive_imputer
>>> X, Xt, *_ = fetch_data ('bagoue', split_X_y =True, as_frame =True)
>>> X =to_numeric_dtypes(X, pop_cat_features=True )
>>> X= naive_imputer(X)
>>> Xt = to_numeric_dtypes(Xt, pop_cat_features=True )
>>> Xt= naive_imputer(Xt)
>>> plot_kws = dict (fig_size=(8, 12),
                 lc='k',
                 marker='o',
                 lw =3.,
                 font_size=15.,
                 xlabel= 'easting (m) ',
                 ylabel='northing (m)' ,
                 markerfacecolor ='k',
                 markeredgecolor='r',
                 alpha =1.,
                 markeredgewidth=2.,
                 show_grid =True,
                 galpha =0.2,
                 glw=.5,
                 rotate_xlabel =90.,
                 fs =3.,
                 s =None )
>>> plotProjection( X, Xt , columns= ['east', 'north'],
                    trainlabel='train location',
                    testlabel='test location', **plot_kws
                   )
watex.view.plotSilhouette(X, labels=None, prefit=True, n_clusters=3, n_init=10, max_iter=300, random_state=None, tol=10000.0, metric='euclidean', **kwd)[source]#

quantifies the quality of clustering samples.

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous. If a sparse matrix is passed, a copy will be made if it’s not in CSR format.

  • labels (array-like 1d of shape (n_samples,)) – Label values for each sample.

  • n_clusters (int, default=8) – The number of clusters to form as well as the number of centroids to generate.

  • prefit (bool, default=False) – Whether a prefit labels is expected to be passed into the function directly or not. If True, labels must be a fit predicted values target. If False, labels is fitted and updated from X by calling fit_predict methods. Any other values passed to labels is discarded.

  • n_init (int, default=10) – Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

  • max_iter (int, default=300) – Maximum number of iterations of the k-means algorithm for a single run.

  • tol (float, default=1e-4) – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.

  • verbose (int, default=0) – Verbosity mode.

  • random_state (int, RandomState instance or None, default=42) – Determines random number generation for centroid initialization. Use an int to make the randomness deterministic.

  • tol – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.

  • metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by sklearn.metrics.pairwise.pairwise_distances(). If X is the distance array itself, use “precomputed” as the metric. Precomputed distance matrices must have 0 along the diagonal.

  • **kwds (optional keyword parameters) – Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.

Note

The sihouette coefficient is bound between -1 and 1

watex.view.plot_matshow(arr, /, labelx=None, labely=None, matshow_kws=None, **baseplot_kws)[source]#

Quick matrix visualization using matplotlib.pyplot.matshow.

Parameters:
  • arr (2D ndarray,) – matrix of n rowns and m-columns items

  • matshow_kws (dict) – Additional keywords arguments for matplotlib.axes.matshow()

  • labelx (list of str, optional) – list of labels names that express the name of each category on x-axis. It might be consistent with the matrix number of columns of arr.

  • label (list of str, optional) – list of labels names that express the name of each category on y-axis. It might be consistent with the matrix number of row of arr.

Examples

>>> import numpy as np
>>> from watex.view.mlplot import plot_matshow
>>> matshow_kwargs ={
    'aspect': 'auto',
    'interpolation': None,
   'cmap':'copper_r',
        }
>>> baseplot_kws ={'lw':3,
           'lc':(.9, 0, .8),
           'font_size':15.,
            'cb_format':None,
            #'cb_label':'Rate of prediction',
            'xlabel': 'Predicted flow classes',
            'ylabel': 'Geological rocks',
            'font_weight':None,
            'tp_labelbottom':False,
            'tp_labeltop':True,
            'tp_bottom': False
            }
>>> labelx =['FR0', 'FR1', 'FR2', 'FR3', 'Rates']
>>> labely =['VOLCANO-SEDIM. SCHISTS', 'GEOSYN. GRANITES',
             'GRANITES', '1.0', 'Rates']
>>> array2d = np.array([(1. , .5, 1. ,1., .9286),
                    (.5,  .8, 1., .667, .7692),
                    (.7, .81, .7, .5, .7442),
                    (.667, .75, 1., .75, .82),
                    (.9091, 0.8064, .7, .8667, .7931)])
>>> plot_matshow(array2d, labelx, labely, matshow_kwargs,**baseplot_kws )
watex.view.plot_model_scores(models, scores=None, cv_size=None, **baseplot_kws)[source]#

uses the cross validation to get an estimation of model performance generalization.

It Visualizes model fined tuned scores vs the cross validation

Parameters:
  • models (list of callables, always as a functions,) –

    list of estimator names can also be a pair estimators and validations scores.For instance estimators and scores can be arranged as:

    models =[('SVM', scores_svm), ('LogRegress', scores_logregress), ...]
    

    If that arrangement is passed to models parameter then no need to pass the score values of each estimators in scores. Note that a model is an object which manages the estimation and decoding. The model is estimated as a deterministic function of:

    • parameters provided in object construction or with set_params;

    • the global numpy.random random state if the estimator’s random_state

      parameter is set to None; and

    • any data or sample properties passed to the most recent call to fit,

      fit_transform or fit_predict, or data similarly passed in a sequence of calls to partial_fit.

    list of estimators names or a pairs estimators and validations scores. For instance:

    clfs =[('SVM', scores_svm), ('LogRegress', scores_logregress), ...]
    

  • scores (array like) –

    list of scores on different validation sets. If scores are given, set only the name of the estimators passed to models like:

    models =['SVM', 'LogRegress', ...]
    scores=[scores_svm, scores_logregress, ...]
    

  • cv_size (float or int,) – The number of fold used for validation. If different models have different cross validation values, the minimum size of cross validation is used and the scored of each model is resized to match the minimum size number.

  • baseplot_kws (dict,) – All all the keywords arguments passed to the peroperty watex.property.BasePlot class.

Examples

(1) -> Score is appended to the model >>> from watex.exlib.sklearn import SVC >>> from watex.view.mlplot import plot_model_scores >>> import numpy as np >>> svc_model = SVC() >>> fake_scores = np.random.permutation (np.arange (0, 1, .05)) >>> plot_model_scores([(svc_model, fake_scores )]) … (2) -> Use model and score separately

>>> plot_model_scores([svc_model],scores =[fake_scores] )#
>>> # customize plot by passing keywords properties
>>> base_plot_params ={
                    'lw' :3.,
                    'lc':(.9, 0, .8),
                    'ms':7.,
                    'fig_size':(12, 8),
                    'font_size':15.,
                    'xlabel': 'samples',
                    'ylabel':'scores' ,
                    'marker':'o',
                    'alpha' :1.,
                    'yp_markeredgewidth':2.,
                    'show_grid' :True,
                    'galpha' :0.2,
                    'glw':.5,
                    'rotate_xlabel' :90.,
                    'fs' :3.,
                    's' :20 ,
                    'sns_style': 'darkgrid',
               }
>>> plot_model_scores([svc_model],scores =[fake_scores] , **base_plot_params )
watex.view.plot_reg_scoring(reg, X, y, test_size=None, random_state=42, scoring='mse', return_errors=False, **baseplot_kws)[source]#

Plot regressor learning curves using root-mean squared error scorings.

Use the hold-out cross-validation technique for score evaluation [1].

Parameters:
  • reg (callable, always as a function) – A regression estimator; Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator. The estimated model is stored in public and private attributes on the estimator instance, facilitating decoding through prediction and transformation methods. The core functionality of some estimators may also be available as a function.

  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • scoring (str, ['mse'|'rmse'], default ='mse') – kind of error to visualize on the regression learning curve.

  • test_size (float or int, default=None) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • random_state (int, RandomState instance or None, default=None) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls..

  • return_errors (bool, default='False') – returns training eror and validation errors.

  • baseplot_kws (dict,) – All all the keywords arguments passed to the peroperty watex.property.BasePlot class.

Returns:

(train_errors, val_errors) – training score and validation scores if return_errors is set to True, otherwise returns nothing

Return type:

Tuple,

Examples

>>> from watex.datasets import fetch_data
>>> from watex.view.mlplot import plot_reg_scoring
>>> # Note that for the demo, we import SVC rather than LinearSVR since the
>>> # problem of Bagoue dataset is a classification rather than regression.
>>> # if use regression instead, a convergence problem will occurs.
>>> from watex.exlib.sklearn import SVC
>>> X, y = fetch_data('bagoue analysed')# got the preprocessed and imputed data
>>> svm =SVC()
>>> t_errors, v_errors =plot_reg_scoring(svm, X, y, return_errors=True)

Notes

The hold-out technique is the classic and most popular approach for estimating the generalization performance of the machine learning. The dataset is splitted into training and test sets. The former is used for the model training whereas the latter is used for model performance evaluation. However in typical machine learning we are also interessed in tuning and comparing different parameter setting for futher improve the performance for the name refering to the given classification or regression problem for which we want the optimal values of tuning the hyperparameters. Thus, reusing the same datset over and over again during the model selection is not recommended since it will become a part of the training data and then the model will be more likely to overfit. From this issue, the hold-out cross validation is not a good learning practice. A better way to use the hold-out method is to separate the data into three parts such as the traing set, the the validation set and the test dataset. See more in [2].

References

[1]

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., et al. (2011) Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12, 2825–2830.

[2]

Raschka, S. & Mirjalili, V. (2019) Python Machine Learning. (J. Malysiak, S. Jain, J. Lovell, C. Nelson, S. D’silva & R. Atitkar, Eds.), 3rd ed., Packt.

watex.view.pobj#

alias of Plot

watex.view.viewtemplate(y, /, xlabel=None, ylabel=None, **kws)[source]#

Quick view template

Parameters:
  • y (Arraylike , shape (N, )) –

  • xlabel (str, Optional) – Label for naming the x-abscissia

  • ylabel (str, Optional,) – Label for naming the y-coordinates.

  • kws (dict,) – keywords argument passed to matplotlib.pyplot.plot()

Submodules#