watex.view package#
View is the visualization sub-package. It is divised into the learning plot
(mlplot) and, data analysis and exploratory modules
(plot).
- class watex.view.EvalPlot(tname=None, encode_labels=False, scale=None, cv=None, objective=None, prefix=None, label_values=None, litteral_classes=None, **kws)[source]#
Bases:
BasePlotMetrics, dimensionality and model evaluatation plots.
Inherited from
BasePlot. Dimensional reduction and metric plots. The class works only with numerical features.Discouraged
Contineous target values for plotting classification metrics is discouraged. However, We encourage user to prepare its dataset before using the
EvalPlotmethods. This is recommended to have full control of the expected results. Indeed, the most metrics plot implemented here works with supervised methods especially deals with the classification problems. So, the convenient way is for users to discretize/categorize (class labels) before the fit. If not the case, as the examples of demonstration under each method implementation, we first need to categorize the continue labels. The choice is twofolds: either providing individual class label as a list of integers using the methodEvalPlot._cat_codes_y()or by specifying the number of clusters that the target must hold. Commonly the latter choice is usefull for a test or academic purpose. In practice into a real dataset, it is discouraged to use this kind of target partition since, it is far away of the reality and will yield unexpected misinterpretation.- Parameters
X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.
Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
tname (str,) – A target name or label. In supervised learning the target name is considered as the reference name of y or label variable.
objective (str, default=None,) – The purpose of dataset; what probem do we intend to solve ? Originally the package was designed for flow rate prediction. Thus, if the objective is set to
flow, plot will behave like the flow rate prediction purpose and in that case, some condition of target values need to be fullfilled. Furthermore, if the objective is set toflow, label_values` as well as the litteral_classes parameters need to be supplied to right encode the target according to the hydraulic system requirement during the campaign for drinking water supply. For any other purpose for the dataset, keep the objective toNone. Default isNone.encode_labels (bool, default=False,) –
label encoding works with label_values parameter. If the y is a continous numerical values, we could turn the regression to classification by setting encode_labels to
True. if value is set toTrueand values of labels is not given, an unique identifier is created which can not fit the exact needs of the users. So it is recommended to set this parameters in combinaison with the`label_values`. For instance:encode_labels=True ; label_values =3
indicates that the target y values should be categorized to hold the integer identifier equals to
[0 , 1, 2]. y are splitted into three subsets where:classes (c) = [ c{0} <= y. min(), y.min() < c {1}< y.max(), >=y.max {2}]
This auto-splitting could not fit the exact classification of the target so it is recommended to set the label_values as a list of class labels. For instance label_values=[0 , 1, 2] and else.
scale (str, ['StandardScaler'|'MinMaxScaler'], default ='StandardScaler') – kind of feature scaling to apply on numerical features. Note that when using PCA, it is recommended to turn scale to
Trueand fit_transform rather than only fit the method. Note that transform method also handle the missing nan value in the data where the default strategy for filling ismost_frequent.cv (float,) –
A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:
* An integer, specifying the number of folds in K-fold cross validation. K-fold will be stratified over classes if the estimator is a classifier (determined by base.is_classifier) and the targets may represent a binary or multiclass (but not multioutput) classification problem (determined by utils.multiclass.type_of_target). * A cross-validation splitter instance. Refer to the User Guide for splitters available within `Scikit-learn`_ * An iterable yielding train/test splits.- With some exceptions (especially where not using cross validation at all
is an option), the default is
4-fold.
prefix (str, optional) – litteral string to prefix the integer identical labels.
label_values (list of int, optional) – works with encode_labels parameters. It indicates the different class labels. Refer to explanation of encode_labels.
Litteral_classes (list or str, optional) –
Works when objective is
flow. Replace class integer names by its litteral strings. For instance:label_values =[0, 1, 3, 6] Litteral_classes = ['rate0', 'rate1', 'rate2', 'rate3']
yp_ls (str, default='-',) – Line style of Predicted label. Can be [ ‘-’ | ‘.’ | ‘:’ ]
yp_lw (str, default= 3) – Line weight of the Predicted plot
yp_lc (str or
matplotlib.cm(), default= ‘k’) – Line color of the Prediction plot. default iskrs (str, default='--') – Line style of Recall metric
rc (str, default=(.6,.6,.6)) – Recall metric colors
pc (str or
matplotlib.cm(), default=’k’) – Precision colors from Matplotlib colormaps.yp_marker (str or
matplotlib.markers(), default =’o’) – Style of marker in of Prediction points.yp_markerfacecolor (str or
matplotlib.cm(), default=’k’) – Facecolor of the Predicted label marker.yp_markeredgecolor (stror
matplotlib.cm(), default= ‘r’) – Edgecolor of the Predicted label marker.yp_markeredgewidth (int, default=2) – Width of the `Predicted`label marker.
savefig (str, Path-like object,) – savefigure’s name, default is
Nonefig_dpi (float,) – dots-per-inch resolution of the figure. default is 300
fig_num (int,) – size of figure in inches (width, height). default is [5, 5]
fig_size (Tuple (int, int) or inch) – size of figure in inches (width, height).*default* is [5, 5]
fig_orientation (str,) – figure orientation. default is
landscapefig_tile (str,) – figure title. default is
Nonefs (float,) – size of font of axis tick labels, axis labels are fs+2. default is 6
ls (str,) – line style, it can be [ ‘-’ | ‘.’ | ‘:’ ] . default is ‘-’
lc (str, Optional,) – line color of the plot, default is
klw (float, Optional,) – line weight of the plot, default is
1.5alpha (float between 0 < alpha < 1,) – transparency number, default is
0.5,font_weight (str, Optional) – weight of the font , default is
bold.font_style (str, Optional) – style of the font. default is
italicfont_size (float, Optional) – size of font in inches (width, height). default is
3.ms (float, Optional) – size of marker in points. default is
5marker (str, Optional) – marker of stations default is
o.marker_style (str, Optional) – facecolor of the marker. default is
yellowmarker_edgecolor (str, Optional) – facecolor of the marker. default is
yellowmarker_edgewidth (float, Optional) – width of the marker. default is
3.xminorticks (float, Optional) – minortick according to x-axis size and default is
1.yminorticks (float, Optional) – yminorticks according to x-axis size and default is
1.bins (histograms element separation between two bar. default is
10.) –xlim (tuple (int, int), Optional) – limit of x-axis in plot.
ylim (tuple (int, int), Optional) – limit of x-axis in plot.
xlabel (str, Optional,) – label name of x-axis in plot.
ylabel (str, Optional,) – label name of y-axis in plot.
rotate_xlabel (float, Optional) – angle to rotate xlabel in plot.
rotate_ylabel (float, Optional) – angle to rotate ylabel in plot.
leg_kws (dict, Optional) – keyword arguments of legend. default is empty
dictplt_kws (dict, Optional) – keyword arguments of plot. default is empty
dictglc (str, Optional) – line color of the grid plot, default is
kglw (float, Optional) – line weight of the grid plot, default is
2galpha (float, Optional,) – transparency number of grid, default is
0.5gaxis (str ('x', 'y', 'both')) – type of axis to hold the grid, default is
bothgwhich (str, Optional) – kind of grid in the plot. default is
majortp_axis (bool,) – axis to apply the ticks params. default is
bothtp_labelsize (str, Optional) – labelsize of ticks params. default is
italictp_bottom (bool,) – position at bottom of ticks params. default is
True.tp_labelbottom (bool,) – put label on the bottom of the ticks. default is
Falsetp_labeltop (bool,) – put label on the top of the ticks. default is
Truecb_orientation (str , ('vertical', 'horizontal')) – orientation of the colorbar, default is
verticalcb_aspect (float, Optional) – aspect of the colorbar. default is
20.cb_shrink (float, Optional) – shrink size of the colorbar. default is
1.0cb_pad (float,) – pad of the colorbar of plot. default is
.05cb_anchor (tuple (float, float)) – anchor of the colorbar. default is
(0.0, 0.5)cb_panchor (tuple (float, float)) – proportionality anchor of the colorbar. default is
(1.0, 0.5)cb_label (str, Optional) – label of the colorbar.
cb_spacing (str, Optional) – spacing of the colorbar. default is
uniformcb_drawedges (bool,) – draw edges inside of the colorbar. default is
False
Notes
This module works with numerical data i.e if the data must contains the numerical features only. If categorical values are included in the dataset, they should be removed and the size of the data should be chunked during the fit methods.
- fit(X=None, y=None, **fit_params)[source]#
Fit data and populate the attributes for plotting purposes.
There is no conventional procedure for checking if a method is fitted. However, an class that is not fitted should raise
watex.exceptions.NotFittedErrorwhen a method is called.- Parameters
X (Ndarray ( M x N matrix where
M=m-samples, &N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.y (array-like, shape (M, )
M=m-samples,) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.data (Filepath or Dataframe or shape (M, N) from) –
pandas.DataFrame. Dataframe containing samples M and features Nfit_params (dict Additional keywords arguments from) – :func:watex.utils.coreutils._is_readable`
- Returns
``self`` – returns
selffor easy method chaining.- Return type
EvalPlot instance
- fit_transform(X, y=None, **fit_params)[source]#
Fit and transform at once.
- Parameters
X (Ndarray ( M x N matrix where
M=m-samples, &N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.- Returns
X – The transformed array or dataframe with numerical features
- Return type
NDArray |Dataframe , shape (M x N )
- property inspect#
Inspect data and trigger plot after checking the data entry. Raises NotFittedError if ExPlot is not fitted yet.
- plotConfusionMatrix(clf, *, kind=None, labels=None, matshow_kws=None, **conf_mx_kws)[source]#
Plot confusion matrix for error evaluation.
A representation of the confusion matrix for error visualization. If kind is set
map, plot will give the number of confused instances/items. However when kind is set toerror, the number of items confused is explained as a percentage.- Parameters
clf (callable, always as a function, classifier estimator) – A supervised predictor with a finite set of discrete possible output values. A classifier must supports modeling some of binary, targets. It must store a classes attribute after fitting.
- labels: int, or list of int, optional
Specific class to evaluate the tradeoff of precision
and recall. label needs to be specified and a value within the target.
- plottype: str
can be map or error to visualize the matshow of prediction and errors respectively.
- matshow_kws: dict
matplotlib additional keywords arguments.
- conf_mx_kws: dict
Additional confusion matrix keywords arguments.
- ylabel: list
list of labels names to hold the name of each categories. Return
self: EvalPlot instanceselffor easy method chaining.
Examples
>>> from watex.datasets import fetch_data >>> from watex.utils.mlutils import cattarget >>> from watex.exlib.sklearn import SVC >>> from watex.view.mlplot import EvalPlot >>> X, y = fetch_data ('bagoue', return_X_y=True, as_frame =True) >>> # partition the target into 4 clusters-> just for demo >>> b= EvalPlot(scale =True, label_values = 4 ) >>> b.fit_transform (X, y) >>> # prepare our estimator >>> svc_clf = SVC(C=100, gamma=1e-2, kernel='rbf', random_state =42) >>> matshow_kwargs ={ 'aspect': 'auto', # 'auto'equal 'interpolation': None, 'cmap':'jet } >>> plot_kws ={'lw':3, 'lc':(.9, 0, .8), 'font_size':15., 'cb_format':None, 'xlabel': 'Predicted classes', 'ylabel': 'Actual classes', 'font_weight':None, 'tp_labelbottom':False, 'tp_labeltop':True, 'tp_bottom': False } >>> b.plotConfusionMatrix(clf=svc_clf, matshow_kws = matshow_kwargs, **plot_kws) >>> svc_clf = SVC(C=100, gamma=1e-2, kernel='rbf', ... random_state =42) >>> # replace the integer identifier with litteral string >>> b.litteral_classes = ['FR0', 'FR1', 'FR2', 'FR3'] >>> b.plotConfusionMatrix(svc_clf, matshow_kws=matshow_kwargs, kind='error', **plot_kws)
- plotPCA(n_components=None, *, n_axes=None, biplot=False, pc1_label='Axis 1', pc2_label='Axis 2', plot_dict=None, **pca_kws)[source]#
Plot PCA component analysis using
decomposition.PCA identifies the axis that accounts for the largest amount of variance in the train set X. It also finds a second axis orthogonal to the first one, that accounts for the largest amount of remaining variance.
- Parameters
n_components (Number of dimension to preserve. If`n_components`) – is ranged between float 0. to 1., it indicates the number of variance ratio to preserve. If
Noneas default value the number of variance to preserve is95%.n_axes (Number of importance components to retrieve the) – variance ratio. Default is
2. The first two importance components with most variance ratio.biplot (bool,) – biplot plots PCA features importance (pc1 and pc2) and visualize the level of variance and direction of components for different variables. Refer to Serafeim Loukas
pc1_label (str, default ='Axis 1') – the first component with most variance held in ‘Axis 1’. Can be modified to any other axis for instance ‘Axis 3’ to replace the component in ‘Axis 1’ to the one in Axis 3 and so one. This will allow to visualize the position of each level of variance for each variable.
pc2_label (str, default ='Axis 2',) – the second component with most variance held in ‘Axis 2’. Can be modified to any other axis for instance ‘Axis 6’ to replace the component in ‘Axis 2’ to the one in Axis 6 and so one.
plot_dict (dict,) – dictionnary of font and properties for markers for each sample corresponding to the label_values.
pca_kws (dict,) – additional keyword arguments passed to
watex.analysis.dimensionality.nPCA
- Returns
``self`` –
selffor easy method chaining.- Return type
EvalPlot instance
Notes
By default, nPCA methods plots the first two principal components named pc1_label for axis 1 and pc2_label for axis 2. If you want to plot the first component pc1 vs the third components`pc2` set the pc2_label to Axis 3 and set the n_components to 3 that is the max reduced columns to retrieve, otherwise an users warning will be displayed. Commonly Algorithm should automatically detect the digit
3in the litteral pc1_labels including Axis (e.g. ‘Axis 3`) and will consider as the third component `pc3 `. The same process is available for other axis.Examples
>>> from watex.datasets import load_bagoue >>> from watex.view.mlplot import EvalPlot >>> X , y = load_bagoue(as_frame =True ) >>> b=EvalPlot(tname ='flow', encode_labels=True , scale = True ) >>> b.fit_transform (X, y) >>> b.plotPCA (n_components= 2 ) ... >>> # pc1 and pc2 labels > n_components -> raises user warnings >>> b.plotPCA (n_components= 2 , biplot=False, pc1_label='Axis 3', pc2_label='axis 4') ... UserWarning: Number of components and axes might be consistent; '2'and '4 are given; default two components are used. >>> b.plotPCA (n_components= 8 , biplot=False, pc1_label='Axis3', pc2_label='axis4') # works fine since n_components are greater to the number of axes ... EvalPlot(tname= None, objective= None, scale= True, ... , sns_height= 4.0, sns_aspect= 0.7, verbose= 0)
- plotPR(clf, label, kind=None, method=None, cvp_kws=None, **prt_kws)[source]#
Precision/recall (PR) and tradeoff plots.
PR computes a score based on the decision function and plot the result as a score vs threshold.
- Parameters
clf (callable, always as a function, classifier estimator) – A supervised predictor with a finite set of discrete possible output values. A classifier must supports modeling some of binary, targets. It must store a classes attribute after fitting.
- label: int,
Specific class to evaluate the tradeoff of precision and recall. label needs to be specified and a value within the target. kind: str, [‘threshold|’recall’], default=’threshold’ kind of PR plot. If kind is ‘recall’, method plots the precision VS the recall scores, otherwiwe the PR tradeoff is plotted against the ‘threshold.’
- method: str
Method to get scores from each instance in the trainset. Could be
decison_funcionorpredict_proba. When using the scikit-Learn classifier, it generally has one of the method. Default isdecision_function.- cvp_kws: dict, optional
The
sklearn.model_selection.cross_val_predict()keywords additional arguments- prt_kws:dict,
Additional keyword arguments passed to func:watex.exlib.sklearn.precision_recall_tradeoff Return
self: EvalPlot instanceselffor easy method chaining.
Examples
>>> from watex.exlib.sklearn import SGDClassifier >>> from watex.datasets.dload import load_bagoue >>> from watex.utils import cattarget >>> from watex.view.mlplot import EvalPlot >>> X , y = load_bagoue(as_frame =True ) >>> sgd_clf = SGDClassifier(random_state= 42) # our estimator >>> b= EvalPlot(scale = True , encode_labels=True) >>> b.fit_transform(X, y) >>> # binarize the label b.y >>> ybin = cattarget(b.y, labels= 2 ) # can also use labels =[0, 1] >>> b.y = ybin >>> # plot the Precision-recall tradeoff >>> b.plotPR(sgd_clf , label =1) # class=1 ... EvalPlot(tname= None, objective= None, scale= True, ... , sns_height= 4.0, sns_aspect= 0.7, verbose= 0)
- plotROC(clfs, label, method=None, cvp_kws=None, **roc_kws)[source]#
Plot receiving operating characteric (ROC) classifiers.
Can plot multiple classifiers at once. If multiple classifiers are given, each classifier must be a tuple of
( <name>, classifier>, <method>). For instance, to plot the bothsklearn.ensemble.RandomForestClassifierandsklearn.linear_model.SGDClassifierclassifiers, they must be ranged as follow:clfs =[ ('sgd', SGDClassifier(), "decision_function" ), ('forest', RandomForestClassifier(), "predict_proba") ]
It is important to know whether the method ‘predict_proba’ is valid for the scikit-learn classifier, we want to plot its ROC curve.
- Parameters
clfs (callables, always as a function, classifier estimators) – A supervised predictor with a finite set of discrete possible output values. A classifier must supports modeling some of binary, targets. It must store a classes attribute after fitting.
label (int,) – Specific class to evaluate the tradeoff of precision and recall. label needs to be specified and a value within the target.
kind (str, ['threshold|'recall'], default='threshold') – kind of PR plot. If kind is ‘recall’, method plots the precision VS the recall scores, otherwiwe the PR tradeoff is plotted against the ‘threshold.’
method (str) – Method to get scores from each instance in the trainset. Could be
decison_funcionorpredict_proba. When using the scikit-Learn classifier, it generally has one of the method. Default isdecision_function.cvp_kws (dict, optional) – The
sklearn.model_selection.cross_val_predict()keywords additional argumentsprt_kws (dict,) – Additional keyword arguments passed to func:watex.exlib.sklearn.precision_recall_tradeoff
roc_kws (dict) – roc_curve additional keywords arguments.
- Returns
``self`` –
selffor easy method chaining.- Return type
EvalPlot instance
Examples
Plot ROC for single classifier
>>> from watex.exlib.sklearn import ( SGDClassifier, RandomForestClassifier ) >>> from watex.datasets.dload import load_bagoue >>> from watex.utils import cattarget >>> from watex.view.mlplot import EvalPlot >>> X , y = load_bagoue(as_frame =True ) >>> sgd_clf = SGDClassifier(random_state= 42) # our estimator >>> b= EvalPlot(scale = True , encode_labels=True) >>> b.fit_transform(X, y) >>> # binarize the label b.y >>> ybin = cattarget(b.y, labels= 2 ) # can also use labels =[0, 1] >>> b.y = ybin >>> # plot the ROC >>> b.plotROC(sgd_clf , label =1) # class=1 ... EvalPlot(tname= None, objective= None, scale= True, ... , sns_height= 4.0, sns_aspect= 0.7, verbose= 0)
(2)-> Plot ROC for multiple classifiers
>>> b= EvalPlot(scale = True , encode_labels=True, lw =3., lc=(.9, 0, .8), font_size=7 ) >>> sgd_clf = SGDClassifier(random_state= 42) >>> forest_clf =RandomForestClassifier(random_state=42) >>> b.fit_transform(X, y) >>> # binarize the label b.y >>> ybin = cattarget(b.y, labels= 2 ) # can also use labels =[0, 1] >>> b.y = ybin >>> clfs =[('sgd', sgd_clf, "decision_function" ), ('forest', forest_clf, "predict_proba")] >>> b.plotROC (clfs =clfs , label =1 ) ... EvalPlot(tname= None, objective= None, scale= True, ... , sns_height= 4.0, sns_aspect= 0.7, verbose= 0)
- transform(X, **t_params)[source]#
Transform the data and imputs the numerical features.
It is not convenient to use transform if user want to keep categorical values in the array
- Parameters
X (Ndarray ( M x N matrix where
M=m-samples, &N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.t_params (dict,) – Keyword arguments passed to
sklearn.impute.SimpleImputerfor imputing the missing data; default strategy is ‘most_frequent’ or keywords arguments passed to :func:watex.utils.funcutils.to_numeric_dtypes`
- Returns
X – The transformed array or dataframe with numerical features
- Return type
NDArray |Dataframe , shape (M x N )
- class watex.view.ExPlot(tname=None, inplace=False, **kws)[source]#
Bases:
BasePlotExploratory plot for data analysis
ExPlot is a shadow class. Explore data is needed to create a model since it gives a feel for the data and also at great excuses to meet and discuss issues with business units that controls the data. ExPlot methods i.e. return an instancied object that inherits from
watex.property.BaseplotsABC (Abstract Base Class) for visualization.- Parameters
savefig (str, Path-like object,) – savefigure’s name, default is
Nonefig_dpi (float,) – dots-per-inch resolution of the figure. default is 300
fig_num (int,) – size of figure in inches (width, height). default is [5, 5]
fig_size (Tuple (int, int) or inch) – size of figure in inches (width, height).*default* is [5, 5]
fig_orientation (str,) – figure orientation. default is
landscapefig_tile (str,) – figure title. default is
Nonefs (float,) – size of font of axis tick labels, axis labels are fs+2. default is 6
ls (str,) – line style, it can be [ ‘-’ | ‘.’ | ‘:’ ] . default is ‘-’
lc (str, Optional,) – line color of the plot, default is
klw (float, Optional,) – line weight of the plot, default is
1.5alpha (float between 0 < alpha < 1,) – transparency number, default is
0.5,font_weight (str, Optional) – weight of the font , default is
bold.font_style (str, Optional) – style of the font. default is
italicfont_size (float, Optional) – size of font in inches (width, height). default is
3.ms (float, Optional) – size of marker in points. default is
5marker (str, Optional) – marker of stations default is
o.marker_style (str, Optional) – facecolor of the marker. default is
yellowmarker_edgecolor (str, Optional) – facecolor of the marker. default is
yellowmarker_edgewidth (float, Optional) – width of the marker. default is
3.xminorticks (float, Optional) – minortick according to x-axis size and default is
1.yminorticks (float, Optional) – yminorticks according to x-axis size and default is
1.bins (histograms element separation between two bar. default is
10.) –xlim (tuple (int, int), Optional) – limit of x-axis in plot.
ylim (tuple (int, int), Optional) – limit of x-axis in plot.
xlabel (str, Optional,) – label name of x-axis in plot.
ylabel (str, Optional,) – label name of y-axis in plot.
rotate_xlabel (float, Optional) – angle to rotate xlabel in plot.
rotate_ylabel (float, Optional) – angle to rotate ylabel in plot.
leg_kws (dict, Optional) – keyword arguments of legend. default is empty
dictplt_kws (dict, Optional) – keyword arguments of plot. default is empty
dictglc (str, Optional) – line color of the grid plot, default is
kglw (float, Optional) – line weight of the grid plot, default is
2galpha (float, Optional,) – transparency number of grid, default is
0.5gaxis (str ('x', 'y', 'both')) – type of axis to hold the grid, default is
bothgwhich (str, Optional) – kind of grid in the plot. default is
majortp_axis (bool,) – axis to apply the ticks params. default is
bothtp_labelsize (str, Optional) – labelsize of ticks params. default is
italictp_bottom (bool,) – position at bottom of ticks params. default is
True.tp_labelbottom (bool,) – put label on the bottom of the ticks. default is
Falsetp_labeltop (bool,) – put label on the top of the ticks. default is
Truecb_orientation (str , ('vertical', 'horizontal')) – orientation of the colorbar, default is
verticalcb_aspect (float, Optional) – aspect of the colorbar. default is
20.cb_shrink (float, Optional) – shrink size of the colorbar. default is
1.0cb_pad (float,) – pad of the colorbar of plot. default is
.05cb_anchor (tuple (float, float)) – anchor of the colorbar. default is
(0.0, 0.5)cb_panchor (tuple (float, float)) – proportionality anchor of the colorbar. default is
(1.0, 0.5)cb_label (str, Optional) – label of the colorbar.
cb_spacing (str, Optional) – spacing of the colorbar. default is
uniformcb_drawedges (bool,) – draw edges inside of the colorbar. default is
Falsesns_orient ('v' | 'h', optional) – Orientation of the plot (vertical or horizontal). This is usually inferred based on the type of the input variables, but it can be used to resolve ambiguity when both x and y are numeric or when plotting wide-form data. default is
vwhich refer to ‘vertical’sns_style (dict, or one of {darkgrid, whitegrid, dark, white, ticks}) – A dictionary of parameters or the name of a preconfigured style.
sns_palette (seaborn color paltte | matplotlib colormap | hls | husl) – Palette definition. Should be something color_palette() can process. the palette generates the point with different colors
sns_height (float,) – Proportion of axes extent covered by each rug element. Can be negative. default is
4.sns_aspect (scalar (float, int)) – Aspect ratio of each facet, so that aspect * height gives the width of each facet in inches. default is
.7
- Returns
self – returns
selffor easy method chaining.- Return type
Baseclass instance
Examples
>>> import pandas as pd >>> from watex.view import ExPlot >>> data = pd.read_csv ('data/geodata/main.bagciv.data.csv' ) >>> ExPlot(fig_size = (12, 4)).fit(data).missing(kind ='corr') ... <watex.view.plot.ExPlot at 0x21162a975e0>
- fit(data, **fit_params)[source]#
Fit data and populate the arguments for plotting purposes.
There is no conventional procedure for checking if a method is fitted. However, an class that is not fitted should raise
exceptions.NotFittedErrorwhen a method is called.- Parameters
data (Filepath or Dataframe or shape (M, N) from) –
pandas.DataFrame. Dataframe containing samples M and features Nfit_params (dict) – Additional keywords arguments for reading the data is given as a path-like object passed from :func:watex.utils.coreutils._is_readable`
- Returns
``self`` – returns
selffor easy method chaining.- Return type
Plot instance
- property inspect#
Inspect data and trigger plot after checking the data entry. Raises NotFittedError if ExPlot is not fitted yet.
- msg = "{expobj.__class__.__name__} instance is not fitted yet. Call 'fit' with appropriate arguments before using this method."#
- plotbv(xname=None, yname=None, kind='box', **kwd)[source]#
Visualize distributions using the box, boxen or violin plots.
- Parameters
xname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.
yname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.
kind (str) – style of the plot. Can be [‘box’|’boxen’|’violin’]. default is
boxkwd (dict,) – Other keyword arguments are passed down to seaborn.boxplot .
- Returns
``self`` (ExPlot instance and returns
selffor easy)method chaining.
Example
>>> from watex.datasets import fetch_data >>> from watex.view import ExPlot >>> data = fetch_data ('bagoue original').get('data=dfy1') >>> p= ExPlot(tname='flow').fit(data) >>> p.plotbv(xname='flow', yname='sfi', kind='violin')
- plotcutcomparison(xname=None, yname=None, q=10, bins=3, cmap='viridis', duplicates='drop', **kws)[source]#
Compare the cut or q quantiles values of ordinal categories.
It simulates that the the bining of ‘xname’ into a q quantiles, and ‘yname’into bins. Plot is normalized so its fills all the vertical area. which makes easy to see that in the 4*q % quantiles.
- Parameters
xname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.
yname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.
q (int or list-like of float) – Number of quantiles. 10 for deciles, 4 for quartiles, etc. Alternately array of quantiles, e.g. [0, .25, .5, .75, 1.] for quartiles.
bins (int, sequence of scalars, or IntervalIndex) –
The criteria to bin by.
- intDefines the number of equal-width bins in the range of x.
The range of x is extended by .1% on each side to include the minimum and maximum values of x.
- sequence of scalarsDefines the bin edges allowing for non-uniform
width. No extension of the range of x is done.
- IntervalIndexDefines the exact bins to be used. Note that
IntervalIndex for bins must be non-overlapping.
labels (array or False, default None) – Used as labels for the resulting bins. Must be of the same length as the resulting bins. If False, return only integer indicators of the bins. If True, raises an error.
cmap (str, color or list of color, optional) – The matplotlib colormap of the bar faces.
duplicates ({default 'raise', 'drop}, optional) – If bin edges are not unique, raise ValueError or drop non-uniques. default is ‘drop’
kws (dict,) – Other keyword arguments are passed down to pandas.qcut .
- Returns
``self``
- Return type
ExPlot instance and returns
selffor easy method chaining.
Examples
>>> from watex.datasets import fetch_data >>> from watex.view import ExPlot >>> data = fetch_data ('bagoue original').get('data=dfy1') >>> p= ExPlot(tname='flow').fit(data) >>> p.plotcutcomparison(xname ='sfi', yname='ohmS')
- plothist(xname=None, *, kind='hist', **kws)[source]#
A histogram visualization of numerica data.
- Parameters
xname (str , xlabel) – feature name in the dataframe and is the label on x-axis. Raises an error , if it does not exist in the dataframe
kind (str) – Mode of pandas series plotting. the default is
hist.kws (dict,) – additional keywords arguments from : func:pandas.DataFrame.plot
- Returns
``self`` – returns
selffor easy method chaining.- Return type
ExPlot instance
- plothistvstarget(xname, c=None, *, posilabel=None, neglabel=None, kind='binarize', **kws)[source]#
A histogram of continuous against the target of binary plot.
- Parameters
xname (str,) – the column name to consider on x-axis. Shoud be an item in the dataframe columns. Raise an error if element does not exist.
c (str or int) – the class value in y to consider. Raise an error if not in y. value c can be considered as the binary positive class
posilabel (str, Optional) – the label of c considered as the positive class
neglabel (str, Optional) – the label of other classes (categories) except c considered as the negative class
kind (str, Optional, (default, 'binarize')) – the kind of plot features against target. binarize considers plotting the positive class (‘c’) vs negative class (‘not c’)
kws (dict,) – Additional keyword arguments of `seaborn displot`_
- Returns
``self`` – returns
selffor easy method chaining.- Return type
ExPlot instance
Examples
>>> from watex.utils import read_data >>> from watex.view import ExPlot >>> data = read_data ( 'data/geodata/main.bagciv.data.csv' ) >>> p = ExPlot(tname ='flow').fit(data) >>> p.fig_size = (7, 5) >>> p.savefig ='bbox.png' >>> p.plothistvstarget (xname= 'sfi', c = 0, kind = 'binarize', kde=True, posilabel='dried borehole (m3/h)', neglabel = 'accept. boreholes' ) Out[95]: <'ExPlot':xname='sfi', yname=None , tname='flow'>
- plotjoint(xname, yname=None, corr='pearson', kind='scatter', pkg='sns', yb_kws=None, **kws)[source]#
fancier scatterplot that includes histogram on the edge as well as a regression line called a joinplot
- Parameters
xname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.
yname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.
pkg (str, Optional,) – kind or library to use for visualization. can be [‘sns’|’yb’] for ‘seaborn’ or ‘yellowbrick’. default is
sns.kind (str in {'scatter', 'hex'}, default: 'scatter') – The type of plot to render in the joint axes. Note that when kind=’hex’ the target cannot be plotted by color.
corr (str, default: 'pearson') – The algorithm used to compute the relationship between the variables in the joint plot, one of: ‘pearson’, ‘covariance’, ‘spearman’, ‘kendalltau’.
yb_kws (dict,) – Additional keywords arguments from
yellowbrick.JointPlotVisualizerkws (dict,) – Other keyword arguments are passed down to seaborn.joinplot .
- Returns
``self``
- Return type
ExPlot instance and returns
selffor easy method chaining.
Notes
When using the yellowbrick library and array i.e a (x, y) variables in the columns as well as the target arrays must not contain infs or NaNs values. A value error raises if that is the case.
- plotmissing(*, kind=None, sample=None, **kwd)[source]#
Vizualize patterns in the missing data.
- Parameters
data (Dataframe or shape (M, N) from
pandas.DataFrame) – Dataframe containing samples M and features Nkind (str, Optional) –
kind of visualization. Can be
dendrogramm,mbarorbarplot for dendrogram ,msnobar andpltvisualization respectively:barplot counts the nonmissing data using pandasmbaruse themsnopackage to count the numberof nonmissing data.
- dendrogram`` show the clusterings of where the data is missing.
leaves that are the same level predict one onother presence (empty of filled). The vertical arms are used to indicate how different cluster are. short arms mean that branch are similar.
- ``corr` creates a heat map showing if there are correlations
where the data is missing. In this case, it does look like the locations where missing data are corollated.
mpatternsis the default vizualisation. It is useful for viewingcontiguous area of the missing data which would indicate that the missing data is not random. The
matrixfunction includes a sparkline along the right side. Patterns here would also indicate non-random missing data. It is recommended to limit the number of sample to be able to see the patterns.
Any other value will raise an error
sample (int, Optional) – Number of row to visualize. This is usefull when data is composed of many rows. Skrunked the data to keep some sample for visualization is recommended.
Noneplot all the samples ( or examples) in the datakws (dict) – Additional keywords arguments of
msno.matrixplot.
- Returns
``self`` – returns
selffor easy method chaining.- Return type
ExPlot instance
Example
>>> import pandas as pd >>> from watex.view import ExPlot >>> data = pd.read_csv ('data/geodata/main.bagciv.data.csv' ) >>> p = ExPlot().fit(data) >>> p.fig_size = (12, 4) >>> p.plotmissing(kind ='corr')
- plotpairgrid(xname=None, yname=None, vars=None, **kwd)[source]#
Create a pair grid.
Is a matrix of columns and kernel density estimations. To color by a columns from a dataframe, use ‘hue’ parameter.
- Parameters
xname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.
yname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.
vars (list, str) – list of items in the dataframe columns. Raise an error if items dont exist in the dataframe columns.
kws (dict,) – Other keyword arguments are passed down to seaborn.joinplot .
- Returns
``self``
- Return type
ExPlot instance and returns
selffor easy method chaining.
Example
>>> from watex.datasets import fetch_data >>> from watex.view import ExPlot >>> data = fetch_data ('bagoue original').get('data=dfy1') >>> p= ExPlot(tname='flow').fit(data) >>> p.plotpairgrid (vars = ['magnitude', 'power', 'ohmS'] ) ... <'ExPlot':xname=(None,), yname=None , tname='flow'>
- plotpairwisecomparison(corr='pearson', pkg='sns', **kws)[source]#
Create pairwise comparizons between features.
Plots shows a [‘pearson’|’spearman’|’covariance’] correlation.
- Parameters
corr (str, ['pearson'|'spearman'|'covariance']) – Method of correlation to perform. Note that the ‘person’ and ‘covariance’ don’t support string value. If such kind of data is given, turn the corr to spearman. default is
pearsonpkg (str, Optional,) – kind or library to use for visualization. can be [‘sns’|’yb’] for ‘seaborn’ or ‘yellowbrick’ respectively. default is
sns.kws (dict,) – Additional keywords arguments are passed down to
yellowbrick.Rand2Dand seaborn.heatmap
- Returns
``self``
- Return type
ExPlot instance and returns
selffor easy method chaining.
Example
>>> from watex.datasets import fetch_data >>> from watex.view import ExPlot >>> data = fetch_data ('bagoue original').get('data=dfy1') >>> p= ExPlot(tname='flow').fit(data) >>> p.plotpairwisecomparison(fmt='.2f', corr='spearman', pkg ='yb', annot=True, cmap='RdBu_r', vmin=-1, vmax=1 ) ... <'ExPlot':xname='sfi', yname='ohmS' , tname='flow'>
- plotparallelcoords(classes=None, pkg='pd', rxlabel=45, **kwd)[source]#
Use parallel coordinates in multivariates for clustering visualization
- Parameters
classes (list, default: None) –
a list of class names for the legend The class labels for each class in y, ordered by sorted class index. These names act as a label encoder for the legend, identifying integer classes or renaming string labels. If omitted, the class labels will be taken from the unique values in y.
Note that the length of this list must match the number of unique values in y, otherwise an exception is raised.
pkg (str, Optional,) – kind or library to use for visualization. can be [‘sns’|’pd’] for ‘yellowbrick’ or ‘pandas’ respectively. default is
pd.rxlabel (int, default is
45) – rotate the xlabel when using pkg is set topd.kws (dict,) – Additional keywords arguments are passed down to
yellowbrick.ParallelCoordinatesandpandas.plotting.parallel_coordinates()
- Returns
``self``
- Return type
ExPlot instance and returns
selffor easy method chaining.
Examples
>>> from watex.datasets import fetch_data >>> from watex.view import ExPlot >>> data =fetch_data('original data').get('data=dfy1') >>> p = ExPlot (tname ='flow').fit(data) >>> p.plotparallelcoords(pkg='yb') ... <'ExPlot':xname=None, yname=None , tname='flow'>
- plotradviz(classes=None, pkg='pd', **kwd)[source]#
plot each sample on circle or square, with features on the circonference to vizualize separately between target.
Values are normalized and each figure has a spring that pulls samples to it based on the value.
- Parameters
classes (list of int | float, [categorized classes]) – must be a value in the target. Specified classes must match the number of unique values in target. otherwise an error occurs. the default behaviour i.e.
Nonedetect all classes in unique value in the target.pkg (str, Optional,) –
- kind or library to use for visualization. can be [‘sns’|’pd’] for
’yellowbrick’ or ‘pandas’ respectively. default is
pd.
kws (dict,) – Additional keywords arguments are passed down to
yellowbrick.RadViZandpandas.plotting.radviz()
- Returns
``self``
- Return type
ExPlot instance and returns
selffor easy method chaining.
Examples
(1)-> using yellowbrick RadViz
>>> from watex.datasets import fetch_data >>> from watex.view import ExPlot >>> data0 = fetch_data('bagoue original').get('data=dfy1') >>> p = ExPlot(tname ='flow').fit(data0) >>> p.plotradviz(classes= [0, 1, 2, 3] ) # can set to None
-> Using pandas radviz plot
>>> # use pandas with >>> data2 = fetch_data('bagoue original').get('data=dfy2') >>> p = ExPlot(tname ='flow').fit(data2) >>> p.plotradviz(classes= None, pkg='pd' ) ... <'ExPlot':xname=None, yname=None , tname='flow'>
- plotscatter(xname=None, yname=None, c=None, s=None, **kwd)[source]#
Shows the relationship between two numeric columns.
- Parameters
xname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.
yname (vectors or keys in data) – Variables that specify positions on the x and y axes. Both are the column names to consider. Shoud be items in the dataframe columns. Raise an error if elements do not exist.
c (str, int or array_like, Optional) –
- The color of each point. Possible values are:
- A single color string referred to by name, RGB or RGBA code,
for instance ‘red’ or ‘#a98d19’.
- A sequence of color strings referred to by name, RGB or RGBA
code, which will be used for each point’s color recursively. For instance [‘green’,’yellow’] all points will be filled in green or yellow, alternatively.
- A column name or position whose values will be used to color
the marker points according to a colormap.
s (scalar or array_like, Optional,) –
- The size of each point. Possible values are:
A single scalar so all points have the same size.
- A sequence of scalars, which will be used for each point’s
size recursively. For instance, when passing [2,14] all points size will be either 2 or 14, alternatively.
kwd (dict,) – Other keyword arguments are passed down to seaborn.scatterplot .
- Returns
``self`` – returns
selffor easy method chaining.- Return type
ExPlot instance
Example
>>> from watex.view import ExPlot >>> p = ExPlot(tname='flow').fit(data).plotscatter ( xname ='sfi', yname='ohmS') >>> p ... <'ExPlot':xname='sfi', yname='ohmS' , tname='flow'>
References
Scatterplot: https://seaborn.pydata.org/generated/seaborn.scatterplot.html Pd.scatter plot: https://www.w3resource.com/pandas/dataframe/dataframe-plot-scatter.php
- class watex.view.QuickPlot(classes=None, tname=None, mapflow=False, **kws)[source]#
Bases:
BasePlotSpecial class dealing with analysis modules for quick diagrams, histograms and bar visualizations.
Originally, it was designed for the flow rate prediction, however, it still works with any other dataset by following the parameters details.
- Parameters
data (str, filepath_or_buffer or
pandas.core.DataFrame) – Path -like object or Dataframe. If data is given as path-like object, data is read, asserted and validated. Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be a file://localhost/path/to/table.csv. If you want to pass in a path object, pandas accepts anyos.PathLike. By file-like object, we refer to objects with a read() method, such as a file handle e.g. via builtin open function or StringIO.y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
tname (str,) – A target name or label. In supervised learning the target name is considered as the reference name of y or label variable.
classes (list of int | float, [categorized classes]) –
list of the categorial values encoded to numerical. For instance, for flow data analysis in the Bagoue dataset, the classes could be
[0., 1., 3.]which means:* 0 m3/h --> FR0 * > 0 to 1 m3/h --> FR1 * > 1 to 3 m3/h --> FR2 * > 3 m3/h --> FR3
mapflow (bool,) –
Is refer to the flow rate prediction using DC-resistivity features and work when the tname is set to
flow. If set to True, value in the target columns should map to categorical values. Commonly the flow rate values are given as a trend of numerical values. For a classification purpose, flow rate must be converted to categorical values which are mainly refered to the type of types of hydraulic. Mostly the type of hydraulic system is in turn tided to the number of the living population in a specific area. For instance, flow classes can be ranged as follow:FR = 0 is for dry boreholes
0 < FR ≤ 3m3/h for village hydraulic (≤2000 inhabitants)
3 < FR ≤ 6m3/h for improved village hydraulic(>2000-20 000inhbts)
6 <FR ≤ 10m3/h for urban hydraulic (>200 000 inhabitants).
Note that the flow range from mapflow is not exhaustive and can be modified according to the type of hydraulic required on the project.
savefig (str, Path-like object,) – savefigure’s name, default is
Nonefig_dpi (float,) – dots-per-inch resolution of the figure. default is 300
fig_num (int,) – size of figure in inches (width, height). default is [5, 5]
fig_size (Tuple (int, int) or inch) – size of figure in inches (width, height).*default* is [5, 5]
fig_orientation (str,) – figure orientation. default is
landscapefig_tile (str,) – figure title. default is
Nonefs (float,) – size of font of axis tick labels, axis labels are fs+2. default is 6
ls (str,) – line style, it can be [ ‘-’ | ‘.’ | ‘:’ ] . default is ‘-’
lc (str, Optional,) – line color of the plot, default is
klw (float, Optional,) – line weight of the plot, default is
1.5alpha (float between 0 < alpha < 1,) – transparency number, default is
0.5,font_weight (str, Optional) – weight of the font , default is
bold.font_style (str, Optional) – style of the font. default is
italicfont_size (float, Optional) – size of font in inches (width, height). default is
3.ms (float, Optional) – size of marker in points. default is
5marker (str, Optional) – marker of stations default is
o.marker_style (str, Optional) – facecolor of the marker. default is
yellowmarker_edgecolor (str, Optional) – facecolor of the marker. default is
yellowmarker_edgewidth (float, Optional) – width of the marker. default is
3.xminorticks (float, Optional) – minortick according to x-axis size and default is
1.yminorticks (float, Optional) – yminorticks according to x-axis size and default is
1.bins (histograms element separation between two bar. default is
10.) –xlim (tuple (int, int), Optional) – limit of x-axis in plot.
ylim (tuple (int, int), Optional) – limit of x-axis in plot.
xlabel (str, Optional,) – label name of x-axis in plot.
ylabel (str, Optional,) – label name of y-axis in plot.
rotate_xlabel (float, Optional) – angle to rotate xlabel in plot.
rotate_ylabel (float, Optional) – angle to rotate ylabel in plot.
leg_kws (dict, Optional) – keyword arguments of legend. default is empty
dictplt_kws (dict, Optional) – keyword arguments of plot. default is empty
dictglc (str, Optional) – line color of the grid plot, default is
kglw (float, Optional) – line weight of the grid plot, default is
2galpha (float, Optional,) – transparency number of grid, default is
0.5gaxis (str ('x', 'y', 'both')) – type of axis to hold the grid, default is
bothgwhich (str, Optional) – kind of grid in the plot. default is
majortp_axis (bool,) – axis to apply the ticks params. default is
bothtp_labelsize (str, Optional) – labelsize of ticks params. default is
italictp_bottom (bool,) – position at bottom of ticks params. default is
True.tp_labelbottom (bool,) – put label on the bottom of the ticks. default is
Falsetp_labeltop (bool,) – put label on the top of the ticks. default is
Truecb_orientation (str , ('vertical', 'horizontal')) – orientation of the colorbar, default is
verticalcb_aspect (float, Optional) – aspect of the colorbar. default is
20.cb_shrink (float, Optional) – shrink size of the colorbar. default is
1.0cb_pad (float,) – pad of the colorbar of plot. default is
.05cb_anchor (tuple (float, float)) – anchor of the colorbar. default is
(0.0, 0.5)cb_panchor (tuple (float, float)) – proportionality anchor of the colorbar. default is
(1.0, 0.5)cb_label (str, Optional) – label of the colorbar.
cb_spacing (str, Optional) – spacing of the colorbar. default is
uniformcb_drawedges (bool,) – draw edges inside of the colorbar. default is
Falsesns_orient ('v' | 'h', optional) – Orientation of the plot (vertical or horizontal). This is usually inferred based on the type of the input variables, but it can be used to resolve ambiguity when both x and y are numeric or when plotting wide-form data. default is
vwhich refer to ‘vertical’sns_style (dict, or one of {darkgrid, whitegrid, dark, white, ticks}) – A dictionary of parameters or the name of a preconfigured style.
sns_palette (seaborn color paltte | matplotlib colormap | hls | husl) – Palette definition. Should be something color_palette() can process. the palette generates the point with different colors
sns_height (float,) – Proportion of axes extent covered by each rug element. Can be negative. default is
4.sns_aspect (scalar (float, int)) – Aspect ratio of each facet, so that aspect * height gives the width of each facet in inches. default is
.7
- Returns
self – returns
selffor easy method chaining.- Return type
Baseclass instance
Examples
>>> from watex.view.plot import QuickPlot >>> data = 'data/geodata/main.bagciv.data.csv' >>> qkObj = QuickPlot( leg_kws= dict( loc='upper right'), ... fig_title = '`sfi` vs`ohmS|`geol`', ... ) >>> qkObj.tname='flow' # target the DC-flow rate prediction dataset >>> qkObj.mapflow=True # to hold category FR0, FR1 etc.. >>> qkObj.fit(data) >>> sns_pkws= dict ( aspect = 2 , ... height= 2, ... ) >>> map_kws= dict( edgecolor="w") >>> qkObj.discussingfeatures(features =['ohmS', 'sfi','geol', 'flow'], ... map_kws=map_kws, **sns_pkws ... )
- barcatdist(basic_plot=True, groupby=None, **kws)[source]#
Bar plot distribution.
Plots a categorical distribution according to the occurence of the target in the data.
- Parameters
basic_pot (bool,) – Plot only the occurence of targetted columns from matplotlib.pyplot.bar function.
groupby (list or dict, optional) –
Group features for plotting. For instance it plot others features located in the df columns. The plot features can be on
listand use default plot properties. To customize plot provide, one may provide, the features ondictwith convenients properties like:* `groupby`= ['shape', 'type'] #{'type':{'color':'b', 'width':0.25 , 'sep': 0.} 'shape':{'color':'g', 'width':0.25, 'sep':0.25}}kws (dict,) – Additional keywords arguments from seaborn.countplot
data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.
- Returns
Returns
selffor easy method chaining.- Return type
QuickPlotinstance
Notes
The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.
Examples
>>> from watex.view.plot import QuickPlot >>> from watex.datasets import load_bagoue >>> data = load_bagoue ().frame >>> qplotObj= QuickPlot(xlabel = 'Anomaly type', ylabel='Number of occurence (%)', lc='b', tname='flow') >>> qplotObj.sns_style = 'darkgrid' >>> qplotObj.fit(data) >>> qplotObj. barcatdist(basic_plot =False, ... groupby=['shape' ])
- corrmatrix(cortype='num', features=None, method='pearson', min_periods=1, **sns_kws)[source]#
Method to quick plot the numerical and categorical features.
Set features by providing the names of features for visualization.
- Parameters
cortype (str,) – The typle of parameters to cisualize their coreletions. Can be
numfor numerical features andcatfor categorical features. Default isnumfor quantitative values.method (str,) – the correlation method. can be ‘spearman’ or person. *Default is
pearsonfeatures (List, optional) – list of the name of features for correlation analysis. If given, must be sure that the names belong to the dataframe columns, otherwise an error will occur. If features are valid, dataframe is shrunk to the number of features before the correlation plot.
min_periods – Minimum number of observations required per pair of columns to have a valid result. Currently only available for
pearsonandspearmancorrelation. For more details refer to https://www.geeksforgeeks.org/python-pandas-dataframe-corr/sns_kws (Other seabon heatmap arguments. Refer to) – https://seaborn.pydata.org/generated/seaborn.heatmap.html
data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.
- Returns
Returns
selffor easy method chaining.- Return type
QuickPlotinstance
Notes
The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.
Example
>>> from watex.view.plot import QuickPlot >>> from watex.datasets import load_bagoue >>> data = load_bagoue ().frame >>> qplotObj = QuickPlot().fit(data) >>> sns_kwargs ={'annot': False, ... 'linewidth': .5, ... 'center':0 , ... # 'cmap':'jet_r', ... 'cbar':True} >>> qplotObj.corrmatrix(cortype='cat', **sns_kwargs)
- property data#
- discussingfeatures(features, *, map_kws=None, map_func=None, **sns_kws)[source]#
Provides the features names at least 04 and discuss with their distribution.
This method maps a dataset onto multiple axes arrayed in a grid of rows and columns that correspond to levels of features in the dataset. The plots produced are often called “lattice”, “trellis”, or ‘small-multiple’ graphics.
- Parameters
features (list) –
List of features for discussing. The number of recommended features for better analysis is four (04) classified as below:
features_disposal = [‘x’, ‘y’, ‘col’, ‘target|hue’]
- where:
x is the features hold to the x-axis, default is``ohmS``
y is the feature located on y_xis, default is
sficol is the feature on column subset, *default` is
coltarget or hue for targetted examples, default is
flow
If 03 features are given, the latter is considered as a target
- map_kws:dict, optional
Extra keyword arguments for mapping plot.
- func_map: callable, Optional
callable object, is a plot style function. Can be a ‘matplotlib-pyplot’ function like
plt.scatteror ‘seaborn-scatterplot’ likesns.scatterplot. The default issns.scatterplot.- sns_kwargs: dict, optional
kwywords arguments to control what visual semantics are used to identify the different subsets. For more details, please consult <http://seaborn.pydata.org/generated/seaborn.FacetGrid.html>.
- data: str or pd.core.DataFrame
Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.
- Returns
Returns
selffor easy method chaining.- Return type
QuickPlotinstance
Notes
The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.
Examples
>>> from watex.view.plot import QuickPlot >>> from watex.datasets import load_bagoue >>> data = load_bagoue ().frame >>> qkObj = QuickPlot( leg_kws={'loc':'upper right'}, ... fig_title = '`sfi` vs`ohmS|`geol`', ... ) >>> qkObj.tname='flow' # target the DC-flow rate prediction dataset >>> qkObj.mapflow=True # to hold category FR0, FR1 etc.. >>> qkObj.fit(data) >>> sns_pkws={'aspect':2 , ... "height": 2, ... } >>> map_kws={'edgecolor':"w"} >>> qkObj.discussingfeatures(features =['ohmS', 'sfi','geol', 'flow'], ... map_kws=map_kws, **sns_pkws ... )
- fit(data, y=None)[source]#
Fit data and populate the attributes for plotting purposes.
- Parameters
data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.
y (array-like, optional) –
- array of the target. Must be the same length as the data. If y
is provided and data is given as
strorDataFrame, all the data should be considered as the X data for analysis.
- returns
self – Returns
selffor easy method chaining.- rtype
QuickPlotinstance
Examples
>>> from watex.datasets import load_bagoue >>> data = load_bagoue ().frame >>> from watex.view.plot import QuickPlot >>> qplotObj= QuickPlot(xlabel = 'Flow classes in m3/h', ylabel='Number of occurence (%)') >>> qplotObj.tname= None # eith nameof target set to None >>> qplotObj.fit(data) >>> qplotObj.data.iloc[1:2, :] ... num name east north ... ohmS lwi geol flow 1 2.0 b2 791227.0 1159566.0 ... 1135.551531 21.406531 GRANITES 0.0 >>> qplotObj.tname= 'flow' >>> qplotObj.mapflow= True # map the flow from num. values to categ. values >>> qplotObj.fit(data) >>> qplotObj.data.iloc[1:2, :] ... num name east north ... ohmS lwi geol flow 1 2.0 b2 791227.0 1159566.0 ... 1135.551531 21.406531 GRANITES FR0
- histcatdist(stacked=False, **kws)[source]#
Histogram plot distribution.
Plots a distributions of categorized classes according to the percentage of occurence.
- Parameters
stacked (bool) – Pill bins one to another as a cummulative values. default is
False.bins (int, optional) – contains the integer or sequence or string
range (list, optional) – is the lower and upper range of the bins
density (bool, optional) – contains the boolean values
weights (array-like, optional) – is an array of weights, of the same shape as data
bottom (float, optional) – is the location of the bottom baseline of each bin
histtype (str, optional) – is used to draw type of histogram. {‘bar’, ‘barstacked’, step, ‘stepfilled’}
align (str, optional) – controls how the histogram is plotted. {‘left’, ‘mid’, ‘right’}
rwidth (float, optional,) – is a relative width of the bars as a fraction of the bin width
log (bool, optional) – is used to set histogram axis to a log scale
color (str, optional) – is a color spec or sequence of color specs, one per dataset
label (str , optional) – is a string, or sequence of strings to match multiple datasets
normed (bool, optional) – an optional parameter and it contains the boolean values. It uses the density keyword argument instead.
data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.
- Returns
Returns
selffor easy method chaining.- Return type
QuickPlotinstance
Notes
The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.
Examples
>>> from watex.view.plot import QuickPlot >>> from watex.datasets import load_bagoue >>> data = load_bagoue ().frame >>> qplotObj= QuickPlot(xlabel = 'Flow classes', ylabel='Number of occurence (%)', lc='b', tname='flow') >>> qplotObj.sns_style = 'darkgrid' >>> qplotObj.fit(data) >>> qplotObj. histcatdist()
- property inspect#
Inspect object whether is fitted or not
- joint2features(features, *, join_kws=None, marginals_kws=None, **sns_kws)[source]#
Joint method allows to visualize correlation of two features.
Draw a plot of two features with bivariate and univariate graphs.
- Parameters
features (list) – List of numerical features to plot for correlating analyses. will raise an error if features does not exist in the data
join_kws (dict, optional) – Additional keyword arguments are passed to the function used to draw the plot on the joint Axes, superseding items in the joint_kws dictionary.
marginals_kws (dict, optional) – Additional keyword arguments are passed to the function used to draw the plot on the marginals Axes.
sns_kwargs (dict, optional) – keywords arguments of seaborn joinplot methods. Refer to <http://seaborn.pydata.org/generated/seaborn.jointplot.html> for more details about usefull kwargs to customize plots.
data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.
- Returns
Returns
selffor easy method chaining.- Return type
QuickPlotinstance
Notes
The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.
Examples
>>> from watex.view.plot import QuickPlot >>> from watex.datasets import load_bagoue >>> data = load_bagoue ().frame >>> qkObj = QuickPlot( lc='b', sns_style ='darkgrid', ... fig_title='Quantitative features correlation' ... ).fit(data) >>> sns_pkws={ ... 'kind':'reg' , #'kde', 'hex' ... # "hue": 'flow', ... } >>> joinpl_kws={"color": "r", 'zorder':0, 'levels':6} >>> plmarg_kws={'color':"r", 'height':-.15, 'clip_on':False} >>> qkObj.joint2features(features=['ohmS', 'lwi'], ... join_kws=joinpl_kws, marginals_kws=plmarg_kws, ... **sns_pkws, ... )
- multicatdist(*, x=None, col=None, hue=None, targets=None, x_features=None, y_features=None, kind='count', **kws)[source]#
Figure-level interface for drawing multiple categorical distributions plots onto a FacetGrid.
Multiple categorials plots from targetted pd.series.
- Parameters
x (list , Optional,) – names of variables in data. Inputs for plotting long-form data. See examples for interpretation. Here it can correspond to x_features , y_features and targets from dataframe. Note that each columns item could be correspond as element of x, y or hue. For instance x_features could refer to x-axis features and must be more than 0 and set into a list. the y_features might match the columns name for sns.catplot. If number of feature is more than one, create a list to hold all features is recommended. the y should fit the sns.catplot argument
hue. Like other it should be on list of features are greater than one.y (list , Optional,) – names of variables in data. Inputs for plotting long-form data. See examples for interpretation. Here it can correspond to x_features , y_features and targets from dataframe. Note that each columns item could be correspond as element of x, y or hue. For instance x_features could refer to x-axis features and must be more than 0 and set into a list. the y_features might match the columns name for sns.catplot. If number of feature is more than one, create a list to hold all features is recommended. the y should fit the sns.catplot argument
hue. Like other it should be on list of features are greater than one.hue (list , Optional,) – names of variables in data. Inputs for plotting long-form data. See examples for interpretation. Here it can correspond to x_features , y_features and targets from dataframe. Note that each columns item could be correspond as element of x, y or hue. For instance x_features could refer to x-axis features and must be more than 0 and set into a list. the y_features might match the columns name for sns.catplot. If number of feature is more than one, create a list to hold all features is recommended. the y should fit the sns.catplot argument
hue. Like other it should be on list of features are greater than one.row – Categorical variables that will determine the faceting of the grid.
data (str or pd.core.DataFrame) – Categorical variables that will determine the faceting of the grid.
optional – Categorical variables that will determine the faceting of the grid.
col_wrapint – “Wrap” the column variable at this width, so that the column facets span multiple rows. Incompatible with a row facet.
estimator (string or callable that maps vector -> scalar, optional) – Statistical function to estimate within each categorical bin.
errorbar (string, (string, number) tuple, or callable) – Name of errorbar method (either “ci”, “pi”, “se”, or “sd”), or a tuple with a method name and a level parameter, or a function that maps from a vector to a (min, max) interval.
n_bootint – Number of bootstrap samples used to compute confidence intervals.
optional – Number of bootstrap samples used to compute confidence intervals.
units (name of variable in data or vector data, optional) – Identifier of sampling units, which will be used to perform a multilevel bootstrap and account for repeated measures design.
seed (int, numpy.random.Generator, or numpy.random.RandomState, optional) – Seed or random number generator for reproducible bootstrapping.
order (lists of strings, optional) – Order to plot the categorical levels in; otherwise the levels are inferred from the data objects.
hue_order (lists of strings, optional) – Order to plot the categorical levels in; otherwise the levels are inferred from the data objects.
row_order (lists of strings, optional) – Order to organize the rows and/or columns of the grid in, otherwise the orders are inferred from the data objects.
col_order (lists of strings, optional) – Order to organize the rows and/or columns of the grid in, otherwise the orders are inferred from the data objects.
height (scalar) – Height (in inches) of each facet. See also: aspect.
aspect (scalar) – Aspect ratio of each facet, so that aspect * height gives the width of each facet in inches.
kind (str, optional) – `The kind of plot to draw, corresponds to the name of a categorical axes-level plotting function. Options are: “strip”, “swarm”, “box”, “violin”, “boxen”, “point”, “bar”, or “count”.
native_scale (bool, optional) – When True, numeric or datetime values on the categorical axis will maintain their original scaling rather than being converted to fixed indices.
formatter (callable, optional) – Function for converting categorical data into strings. Affects both grouping and tick labels.
orient ("v" | "h", optional) – Orientation of the plot (vertical or horizontal). This is usually inferred based on the type of the input variables, but it can be used to resolve ambiguity when both x and y are numeric or when plotting wide-form data.
color (matplotlib color, optional) – Single color for the elements in the plot.
palette (palette name, list, or dict) – Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors.
hue_norm (tuple or matplotlib.colors.Normalize object) – Normalization in data units for colormap applied to the hue variable when it is numeric. Not relevant if hue is categorical.
legend (str or bool, optional) – Set to False to disable the legend. With strip or swarm plots, this also accepts a string, as described in the axes-level docstrings.
legend_out (bool) – If True, the figure size will be extended, and the legend will be drawn outside the plot on the center right.
share{x (bool, 'col', or 'row' optional) – If true, the facets will share y axes across columns and/or x axes across rows.
y} (bool, 'col', or 'row' optional) – If true, the facets will share y axes across columns and/or x axes across rows.
margin_titles (bool) – If True, the titles for the row variable are drawn to the right of the last column. This option is experimental and may not work in all cases.
facet_kws (dict, optional) – Dictionary of other keyword arguments to pass to FacetGrid.
kwargs (key, value pairings) – Other keyword arguments are passed through to the underlying plotting function.
data – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.
- Returns
Returns
selffor easy method chaining.- Return type
QuickPlotinstance
Notes
The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.
Examples
>>> from watex.view.plot import QuickPlot >>> from watex.datasets import load_bagoue >>> data = load_bagoue ().frame >>> qplotObj= QuickPlot(lc='b', tname='flow') >>> qplotObj.sns_style = 'darkgrid' >>> qplotObj.mapflow=True # to categorize the flow rate >>> qplotObj.fit(data) >>> fdict={ ... 'x':['shape', 'type', 'type'], ... 'col':['type', 'geol', 'shape'], ... 'hue':['flow', 'flow', 'geol'], ... } >>> qplotObj.multicatdist(**fdict)
- naiveviz(x=None, y=None, kind='scatter', s_col='lwi', leg_kws={}, **pd_kws)[source]#
Creates a plot to visualize the samples distributions according to the geographical coordinates x and y.
- Parameters
x (str ,) – Column name to hold the x-axis values
y (str,) – column na me to hold the y-axis values
s_col (column for scatter points. ‘Default is
fstime the features) – column lwi.pd_kws (dict, optional,) – Pandas plot keywords arguments
leg_kws (dict, kws) – Matplotlib legend keywords arguments
data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.
- Returns
Returns
selffor easy method chaining.- Return type
QuickPlotinstance
Notes
The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.
Examples
>>> from watex.transformers import StratifiedWithCategoryAdder >>> from watex.view.plot import QuickPlot >>> from watex.datasets import load_bagoue >>> df = load_bagoue ().frame >>> stratifiedNumObj= StratifiedWithCategoryAdder('flow') >>> strat_train_set , *_= ... stratifiedNumObj.fit_transform(X=df) >>> pd_kws ={'alpha': 0.4, ... 'label': 'flow m3/h', ... 'c':'flow', ... 'cmap':plt.get_cmap('jet'), ... 'colorbar':True} >>> qkObj=QuickPlot(fs=25.) >>> qkObj.fit(strat_train_set) >>> qkObj.naiveviz( x= 'east', y='north', **pd_kws)
- numfeatures(features=None, coerce=False, map_lower_kws=None, **sns_kws)[source]#
Plots qualitative features distribution using correlative aspect. Be sure to provide numerical features as data arguments.
- Parameters
features (list) – List of numerical features to plot for correlating analyses. will raise an error if features does not exist in the data
coerce (bool,) – Constraint the data to read all features and keep only the numerical values. An error occurs if
Falseand the data contains some non-numericalfeatures. default isFalse.map_lower_kws (dict, Optional) – a way to customize plot. Is a dictionnary of sns.pairplot map_lower kwargs arguments. If the diagram kind is
kde, plot is customized with the provided map_lower_kws arguments. ifNone, will check whether the diag_kind argument on sns_kws iskdebefore triggering the plotting map.sns_kws (dict,) – Keywords word arguments of seabon pairplots. Refer to http://seaborn.pydata.org/generated/seaborn.pairplot.html for further details.
data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.
- Returns
Returns
selffor easy method chaining.- Return type
QuickPlotinstance
Notes
The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.
Examples
>>> from watex.view.plot import QuickPlot >>> from watex.datasets import load_bagoue >>> data = load_bagoue ().frame >>> qkObj = QuickPlot(mapflow =False, tname='flow' ).fit(data) >>> qkObj.sns_style ='darkgrid', >>> qkObj.fig_title='Quantitative features correlation' >>> sns_pkws={'aspect':2 , ... "height": 2, # ... 'markers':['o', 'x', 'D', 'H', 's', # '^', '+', 'S'], ... 'diag_kind':'kde', ... 'corner':False, ... } >>> marklow = {'level':4, ... 'color':".2"} >>> qkObj.numfeatures(coerce=True, map_lower_kws=marklow, **sns_pkws)
- scatteringfeatures(features, *, relplot_kws=None, **sns_kws)[source]#
Draw a scatter plot with possibility of several semantic features groupings.
Indeed scatteringfeatures analysis is a process of understanding how features in a dataset relate to each other and how those relationships depend on other features. Visualization can be a core component of this process because, when data are visualized properly, the human visual system can see trends and patterns that indicate a relationship.
- Parameters
features (list) – List of numerical features to plot for correlating analyses. will raise an error if features does not exist in the data
relplot_kws (dict, optional) – Extra keyword arguments to show the relationship between two features with semantic mappings of subsets. refer to <http://seaborn.pydata.org/generated/seaborn.relplot.html#seaborn.relplot> for more details.
sns_kwargs (dict, optional) – kwywords arguments to control what visual semantics are used to identify the different subsets. For more details, please consult <http://seaborn.pydata.org/generated/seaborn.scatterplot.html>.
data (str or pd.core.DataFrame) – Path -like object or Dataframe. Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation. If data is given as path-like object,`QuickPlot` reads and sanitizes data before plotting. Be aware in this case to provide the target name and possible the classes for data inspection. Both str or dataframe need to provide the name of target.
- Returns
Returns
selffor easy method chaining.- Return type
QuickPlotinstance
Notes
The argument for data must be passed to fit method. data parameter is not allowed in other QuickPlot method. The description of the parameter data is to give a synopsis of the kind of data the plot expected. An error will raise if force to pass data argument as a keyword arguments.
Examples
>>> from watex.view.plot import QuickPlot >>> from watex.datasets import load_bagoue >>> data = load_bagoue ().frame >>> qkObj = QuickPlot(lc='b', sns_style ='darkgrid', ... fig_title='geol vs lewel of water inflow', ... xlabel='Level of water inflow (lwi)', ... ylabel='Flow rate in m3/h' ... ) >>> >>> qkObj.tname='flow' # target the DC-flow rate prediction dataset >>> qkObj.mapflow=True # to hold category FR0, FR1 etc.. >>> qkObj.fit(data) >>> marker_list= ['o','s','P', 'H'] >>> markers_dict = {key:mv for key, mv in zip( list ( ... dict(qkObj.data ['geol'].value_counts( ... normalize=True)).keys()), ... marker_list)} >>> sns_pkws={'markers':markers_dict, ... 'sizes':(20, 200), ... "hue":'geol', ... 'style':'geol', ... "palette":'deep', ... 'legend':'full', ... # "hue_norm":(0,7) ... } >>> regpl_kws = {'col':'flow', ... 'hue':'lwi', ... 'style':'geol', ... 'kind':'scatter' ... } >>> qkObj.scatteringfeatures(features=['lwi', 'flow'], ... relplot_kws=regpl_kws, ... **sns_pkws, ... )
- class watex.view.TPlot(survey_area=None, distance=50.0, prefix='S', how='py', window_size=5, component='xy', mode='same', method='slinear', out='srho', c=2, **kws)[source]#
Bases:
BasePlotTensor plot from EMAP or AMT processing data.
TPlot is a Tensor (Impedances , resistivity and phases ) plot class. Explore SEG ( Society of Exploration Geophysicist ) class data. Plot recovery tensors. TPlot methods returns an instancied object that inherits from
watex.property.BaseplotsABC (Abstract Base Class) for visualization.- Parameters
window_size (int) – the length of the window. Must be greater than 1 and preferably an odd integer number. Default is
5component (str) – field tensors direction. It can be
xx,xy,``yx``,yy. If arr2d` is provided, no need to give an argument. It become useful when a collection of EDI-objects is provided. If don’t specify, the resistivity and phase value at component xy should be fetched for correction by default. Change the component value to get the appropriate data for correction. Default isxy.mode (str , ['valid', 'same'], default='same') – mode of the border trimming. Should be ‘valid’ or ‘same’.’valid’ is used for regular trimimg whereas the ‘same’ is used for appending the first and last value of resistivity. Any other argument except ‘valid’ should be considered as ‘same’ argument. Default is
same.method (str, default
slinear) – Interpolation technique to use. Can benearest``or ``pad. Refer to the documentation of ~.interpolate2d.out (str) – Value to export. Can be
sfactor,tensorfor corrections factor and impedance tensor. Any other values will export the static corrected resistivitysrho.c (int,) – A window-width expansion factor that must be input to the filter adaptation process to control the roll-off characteristics of the applied Hanning window. It is recommended to select c between
1and4. Default is2.distance (float) – The step between two stations/sites. If given, it creates an array of position for plotting purpose. Default value is
50meters.prefix (str) – string value to add as prefix of given id. Prefix can be the site name. Default is
S.how (str) – Mode to index the station. Default is ‘Python indexing’ i.e. the counting of stations would starts by 0. Any other mode will start the counting by 1.
savefig (str, Path-like object,) – savefigure’s name, default is
Nonefig_dpi (float,) – dots-per-inch resolution of the figure. default is 300
fig_num (int,) – size of figure in inches (width, height). default is [5, 5]
fig_size (Tuple (int, int) or inch) – size of figure in inches (width, height).*default* is [5, 5]
fig_orientation (str,) – figure orientation. default is
landscapefig_tile (str,) – figure title. default is
Nonefs (float,) – size of font of axis tick labels, axis labels are fs+2. default is 6
ls (str,) – line style, it can be [ ‘-’ | ‘.’ | ‘:’ ] . default is ‘-’
lc (str, Optional,) – line color of the plot, default is
klw (float, Optional,) – line weight of the plot, default is
1.5alpha (float between 0 < alpha < 1,) – transparency number, default is
0.5,font_weight (str, Optional) – weight of the font , default is
bold.font_style (str, Optional) – style of the font. default is
italicfont_size (float, Optional) – size of font in inches (width, height). default is
3.ms (float, Optional) – size of marker in points. default is
5marker (str, Optional) – marker of stations default is
o.marker_style (str, Optional) – facecolor of the marker. default is
yellowmarker_edgecolor (str, Optional) – facecolor of the marker. default is
yellowmarker_edgewidth (float, Optional) – width of the marker. default is
3.xminorticks (float, Optional) – minortick according to x-axis size and default is
1.yminorticks (float, Optional) – yminorticks according to x-axis size and default is
1.bins (histograms element separation between two bar. default is
10.) –xlim (tuple (int, int), Optional) – limit of x-axis in plot.
ylim (tuple (int, int), Optional) – limit of x-axis in plot.
xlabel (str, Optional,) – label name of x-axis in plot.
ylabel (str, Optional,) – label name of y-axis in plot.
rotate_xlabel (float, Optional) – angle to rotate xlabel in plot.
rotate_ylabel (float, Optional) – angle to rotate ylabel in plot.
leg_kws (dict, Optional) – keyword arguments of legend. default is empty
dictplt_kws (dict, Optional) – keyword arguments of plot. default is empty
dictglc (str, Optional) – line color of the grid plot, default is
kglw (float, Optional) – line weight of the grid plot, default is
2galpha (float, Optional,) – transparency number of grid, default is
0.5gaxis (str ('x', 'y', 'both')) – type of axis to hold the grid, default is
bothgwhich (str, Optional) – kind of grid in the plot. default is
majortp_axis (bool,) – axis to apply the ticks params. default is
bothtp_labelsize (str, Optional) – labelsize of ticks params. default is
italictp_bottom (bool,) – position at bottom of ticks params. default is
True.tp_labelbottom (bool,) – put label on the bottom of the ticks. default is
Falsetp_labeltop (bool,) – put label on the top of the ticks. default is
Truecb_orientation (str , ('vertical', 'horizontal')) – orientation of the colorbar, default is
verticalcb_aspect (float, Optional) – aspect of the colorbar. default is
20.cb_shrink (float, Optional) – shrink size of the colorbar. default is
1.0cb_pad (float,) – pad of the colorbar of plot. default is
.05cb_anchor (tuple (float, float)) – anchor of the colorbar. default is
(0.0, 0.5)cb_panchor (tuple (float, float)) – proportionality anchor of the colorbar. default is
(1.0, 0.5)cb_label (str, Optional) – label of the colorbar.
cb_spacing (str, Optional) – spacing of the colorbar. default is
uniformcb_drawedges (bool,) – draw edges inside of the colorbar. default is
Falsesns_orient ('v' | 'h', optional) – Orientation of the plot (vertical or horizontal). This is usually inferred based on the type of the input variables, but it can be used to resolve ambiguity when both x and y are numeric or when plotting wide-form data. default is
vwhich refer to ‘vertical’sns_style (dict, or one of {darkgrid, whitegrid, dark, white, ticks}) – A dictionary of parameters or the name of a preconfigured style.
sns_palette (seaborn color paltte | matplotlib colormap | hls | husl) – Palette definition. Should be something color_palette() can process. the palette generates the point with different colors
sns_height (float,) – Proportion of axes extent covered by each rug element. Can be negative. default is
4.sns_aspect (scalar (float, int)) – Aspect ratio of each facet, so that aspect * height gives the width of each facet in inches. default is
.7
- Returns
self – returns
selffor easy method chaining.- Return type
Baseclass instance
Examples
>>> from watex.view.plot import TPlot >>> from watex.datasets import load_edis >>> plot_kws = dict( ylabel = '$Log_{10}Frequency [Hz]$', xlabel = '$Distance(m)$', cb_label = '$Log_{10}Rhoa[\Omega.m$]', fig_size =(6, 3), font_size =7., rotate_xlabel=45, imshow_interp='bicubic', ) >>> edi_data =load_edis (return_data= True, samples=7 ) >>> t= TPlot(**plot_kws ).fit(edi_data) >>> t.fit(edi_data ).plot_tensor2d (to_log10=True ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |Data collected = 7 |EDI success. read= 7 |Rate = 100.0 %| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Out[150]: <AxesSubplot:xlabel='$Distance(m)$', ylabel='$Log_{10}Frequency [Hz]$'>
- fit(data)[source]#
Fit data and populate attributes.
- Parameters
data (str, or list or
pycsamt.core.edi.Ediobject) – Full path to EDI files or collection of EDI-objects- Returns
``self`` – returns
selffor chaining methods.- Return type
watex.view.plot.TPlotinstanciated object
- property inspect#
Inspect object whether is fitted or not
- plotSkew(method='Bahr', view='skew', mode=None, threshold_line=None, show_average_sensistivity=True, suppress_outliers=True, **plot_kws)[source]#
Plot phase sensistive skew visualization
‘Skew’ is also knwown as the conventional asymmetry parameter based on the Z magnitude.
Mosly, the EM signal is influenced by several factors such as the dimensionality of the propagation medium and the physical anomalies, which can distort theEM field both locally and regionally. The distortion of Z was determined from the quantification of its asymmetry and the deviation from the conditions that define its dimensionality. The parameters used for this purpose are all rotational invariant because the Z components involved in its definition are independent of the orientation system used. The conventional asymmetry parameter based on the Z magnitude is the skew defined by Swift (1967) [1] and Bahr (1991) [2].
- Parameters
method (str, default='Bahr':) –
Kind of correction. Can be:
swiftfor the remove distorsion proposed by Swift in 1967. The value close to 0. assume the 1D and 2D structures, and 3D otherwise. However, In general case, the electrical structure of \(\eta < 0.4\) can be treated as a 2D medium.bahrfor the remove distorsion proposed by Bahr in 1991. The latter threshold is set to 0.3. Above this value the structures is 3D.
view (str, default='skew') – phase sensistive visualization. Can be rotational invariant
invariant. In fact, setting tomuorinvariantdoes not change any interpretation when since the distortion of Z are all rotational invariant whether using theBahrorswiftmethods.mode (str, optional) – X-axis coordinates for visualisation. plot either
'frequency'or'periods'. The default is'frequency'threshold_line (float, optional) –
Visualize th threshold line. Can be [‘bahr’, ‘swift’, ‘both’]:
Note that when method is set to
swift, the value close to close to \(0.\) assume the 1D and 2D structures (\(\eta <0.4\)), and 3D otherwise( \(\eta >0.4\)). The threshold line forswiftis set to \(0.4\).when method is set to
Bahr, \(\eta > 0.3\) is 3D structures, between \([0.1 - 0.3]\) assumes modified 3D/2D structures whereas \(<0.1\) 1D, 2D or distorted 2D.
show_average_sensistivity (bool, default=True) – Display the averaged value of skew data at all -frequencies. Value can help a dimensionality interpretation purposes.
suppress_outliers (bool, default=True) – Remove the outliers in the data if exists. It uses the Inter Quartile Range (
IQR) approach. See the documentation ofwatex.utils.remove_outliers(). This is useful for clear interpretation using the skew threshold value.
See also
watex.methods.EMAP.skewFor mathematical skew Bahr and Swift concept formulations.
watex.utils.plot_skewFor phase sensistive skew visualization - naive plot.
Examples
>>> import watex >>> test_data = watex.fetch_data ('edis', samples =37, return_data =True ) >>> watex.TPlot(fig_size =(10, 4), marker ='x').fit( test_data).plotSkew(method ='swift', threshold_line=True)
References
- 1
Swift, C., 1967. A magnetotelluric investigation of an electrical conductivity anomaly in the southwestern United States. Ph.D. Thesis, MIT Press. Cambridge.
- 2
Bahr, K., 1991. Geological noise in magnetotelluric data: a classification of distortion types. Physics of the Earth and Planetary Interiors 66 (1–2), 24–38.
- plot_corrections(fltr='ama', ss_fx=None, ss_fy=None, r=1000.0, nfreq=21, skipfreq=5, tol=0.12, rotate=0.0, distortion=None, distortion_err=None, mode='TE', scale='period', sites=None, seed=None, how='py', show_site=True, survey=None, style=None, errorbar=True, spad=0.5, n_sites=1, mcolors=None, markers=None, **kws)[source]#
Plot apparent resistivity/phase curves and corrections.
Changed in version 0.2.1: Can henceforth display multiple sites by providing the sites as a collection.
- Parameters
fltr (str , default='ama') –
Type of filter to apply.
ssis used to remove the static shift using spatial median filter. Whereasdistis for distorsion removal. Note that distortion might be provided otherwise an error raises. Can also be [‘tma’|’ama’|’flma’] for EMAP filters.tmafor trimming moving-averageamafor adaptative moving-averageflmafor fixed-length moving-average
distortion_tensor (np.ndarray(2, 2, dtype=real)) – Real distortion tensor as a 2x2
error (np.ndarray(2, 2, dtype=real), Optional) – Propagation of errors/uncertainties included
ss_fx (float, Optional) – static shift factor to be applied to x components (ie z[:, 0, :]). This is assumed to be in resistivity scale. If None should be automatically computed using the spatial median filter.
ss_fy (float, optional) – static shift factor to be applied to y components (ie z[:, 1, :]). This is assumed to be in resistivity scale. If
None, should be computed using the spatial filter median.r (float, default=1000.) – radius to look for nearby stations, in meters.
nfreq (int, default=21) – number of frequencies calculate the median static shift. This is assuming the first frequency is the highest frequency. Cause usually highest frequencies are sampling a 1D earth.
skipfreq (int, default=5) – number of frequencies to skip from the highest frequency. Sometimes the highest frequencies are not reliable due to noise or low signal in the AMT deadband. This allows you to skip those frequencies.
tol (float, default=0.12) – Tolerance on the median static shift correction. If the data is noisy the correction factor can be biased away from 1. Therefore the shift_tol is used to stop that bias. If
1-tol < correction < 1+tolthen the correction factor is set to1rotate (float, default=0.) – Rotate Z array by angle alpha in degrees. All angles are referenced to geographic North, positive in clockwise direction. (Mathematically negative!). In non-rotated state, X refs to North and Y to East direction.
mode (str, default='TE',) – Electromagnetic mode. Can be [‘TM’ |’both’]. If
both, components xy and yx are expected in the data.scale (str, default='period') – Visualization on axis labell. can be
'frequency'.sites (int,str, optional) – index of name of the site to plot. site must be composed of a position number. For instance
'S13'. If not provided, a random station is selected instead.seed (int, optional) – Get the same site if site is not provided. seed fetches a random number of site. T
how (str, default='py') – The way the site is fetched for plot. For instance, in Python indexing (default), the site is numbered from 0. For instance ‘site05’ will fetch the data at index 4. If this positioning is not wished, set to ‘None’.
show_site (bool, default=True,) – Display the number of site.
survey (str, optional) – Method used for the survey. e.g., ‘AMT’ for Audio-Magnetotellurics.
style (str, default='default') – Matplotlib style.
errorbar (bool, default=True) – display the error bar.
spad (float, default=.5,) –
pad to display the station in the top of each section plot.
New in version 0.2.1.
n_sites (int, default =1.) – Number of random sites to select for visualizing. It cannot work if the names of sites are given.
mcolors (str, list, optional) – The list of colors for resistivy and phase.
- markersstr, list, optional
The list of marker for resistivy and phase.
markers = None,
- kws: dict,
Addfitional keywords arguments passed to Matplotlib.Axes.Scatter plots.
Examples
>>> import numpy as np >>> import watex as wx >>> edi_data = wx.fetch_data ('edis', return_data =True, samples =27) >>> wx.TPlot(show_grid=True).fit(edi_data).plot_corrections ( seed =52, ) >>> distortion = np.array([[1.1 , 0.6 ],[0.23, 1.9 ]]) >>> wx.TPlot(show_grid=True).fit(edi_data).plot_corrections ( seed =52, mode ='tm', fltr ='dist', distortion =distortion )
- plot_ctensor2d(tensor='res', ffilter='tma', sites=None, to_log10=False)[source]#
Plot filtered tensors
- Parameters
tensor (str , ['res','phase', 'z'], default='res') – kind of tensor to plot. Can be resistivity or phase. If phase, customize your plot to not fit the default ‘res’ behaviour.
ffilter (str ['ama', 'flma', 'tma'], default='tma') – kind of appropriate filter to corrected tensor data.
to_log10 (bool, defaut=False,) – Convert the resistivity data and frequeny in log10.
sites (list of str, optional) – List of stations/sites names. If given, it must have the same length of the positions in of the EDI data. Must fit the number of ‘EDI’ succesffully read.
- Returns
arr2d: 2D filtered tensor array from the component
freqs: array-like 1d of frequency in the survey.
- positions: Sites/stations positions. It is equals to the distance
between stations times the number of sites
sites: list of the names of the station/sites
- base_plot_kws: plot keywords arguments inherits from
watex.property.BasePlot. It composes the last parameters for customizing plot as decorated return function.
- Return type
( arr2d , freqs, positions , sites , base_plot_kws)
Examples
>>> from watex.view.plot import TPlot >>> from watex.datasets import load_edis >>> # get some 3 samples of EDI for demo >>> edi_data = load_edis (return_data =True, samples =3 ) >>> # customize plot by adding plot_kws >>> plot_kws = dict( ylabel = '$Log_{10}Frequency [Hz]$', xlabel = '$Distance(m)$', cb_label = '$Log_{10}Rhoa[\Omega.m$]', fig_size =(6, 3), font_size =7. ) >>> t= TPlot(**plot_kws ).fit(edi_data) >>> # plot filtered tensor using the log10 resistivity >>> t.plot_ctensor2d (to_log10=True) <AxesSubplot:xlabel='$Distance(m)$', ylabel='$Log_{10}Frequency [Hz]$'>
- plot_multi_recovery(sites, colors=None, **kws)[source]#
Plots mutiple site/stations with signal recovery.
- Parameters
sites (list) – list of sites to visualize. Can also be the index of the sites
colors (list of str) – matplotlib colors to customize the raw signal and recovery signal
- Returns
ax
- Return type
Matplotlib suplot axes
Examples
>>> from watex.view.plot import TPlot >>> from watex.datasets import load_edis >>> # takes the 03 samples of EDIs >>> edi_data = load_edis (return_data= True, samples =3 ) >>> TPlot(fig_size =(5, 3)).fit(edi_data).plot_multi_recovery ( sites =['S00'], colors =['o', 'ok--']) <AxesSubplot:title={'center':'Recovered tensor $|Z_{xy}|$'}, xlabel='$Frequency [H_z]$', ylabel='$ App.resistivity \quad xy \quad [ \Omega.m]$'>
- plot_phase_tensors(mode='frequency', stretch=(7000, 20), linedir='ns', tensor='phimin', ellipse_dict=None, **kws)[source]#
Plot phase tensor pseudosection and skew ellipsis visualization.
Method plots the phase tensor ellipses in a pseudo section format. It uses mtpy as dependency.
- Parameters
mode (str, default ='frequency') – Tempoora scale in y-axis. Can be [‘frequency’ | ‘period’]
stretch (float or tuple (xstretch, ystretch), default=200) – Is a factor that scales the distance from one station to the next to make the plot readable. It determines (x,y) aspect ratio of plot.
linedir (str [ 'ns' | 'ew' ], default='ns') –
The predominant direction of profile line. It can be [‘ns’ | ‘ew’] where:
’ns’ refer to North-South Line or line is closer to north-south)
’ew’ refer to East-West line or line is closer to east-west
Default is ‘ns’
tensor (str, default='phimin') –
- Is the tensor skew or ellipsis visualizations. The color for plot
style is referred accordingly. Tensor can be:
where:
’phimin’ -> colors by minimum phase
’phimax’ -> colors by maximum phase
’skew’ -> colors by skew
- ’skew_seg’ -> colors by skew indiscrete segments defined
by the range
’normalized_skew’ -> colors by skew see [Booker, 2014]
- ’normalized_skew_seg’ -> colors by normalized skew in
discrete segments defined by the range
’phidet’ -> colors by determinant of the phase tensor
’ellipticity’ -> colors by ellipticity default is ‘phimin’
ellipse_dict (dict, optional) –
Dictionary of parameters for the phase tensor ellipses with keys:
’size’: float, default =2 , is the size of ellipse in points
- ’range’tuple (min, max, step), default=’colorby’
Need to input at least the min and max and if using ‘skew_seg’ to plot discrete values input step as well
- ’cmap’[ ‘mt_yl2rd’ | ‘mt_bl2yl2rd’ |’mt_wh2bl’ | ‘mt_rd2bl’ |
’mt_bl2wh2rd’ | ‘mt_seg_bl2wh2rd’ |’mt_rd2gr2bl’ ]
’mt_yl2rd’ -> yellow to red
’mt_bl2yl2rd’ -> blue to yellow to red
’mt_wh2bl’ -> white to blue
’mt_rd2bl’ -> red to blue
’mt_bl2wh2rd’ -> blue to white to red
’mt_bl2gr2rd’ -> blue to green to red
’mt_rd2gr2bl’ -> red to green to blue
’mt_seg_bl2wh2rd’ -> discrete blue to white to red
kws (dict) – Additional keywords arguments passed from |MTpy| pseudosection phase tensor class:
PlotPhaseTensorPseudoSection
See also
mtpy.imaging.phase_tensor_pseudosection.PlotPhaseTensorPseudoSectionPlotPhase pseudo section tensor from |MTpy| package.
watex.utils.plot_skewPhase sensitive skew visualization.
Examples
>>> import watex as wx >>> edi_data = wx.fetch_data ('edis', key='edi', return_data =True , samples =17 ) >>> tplot = wx.TPlot ().fit(edi_data ) >>> tplot.plot_phase_tensors (tensor ='skew')
- plot_recovery(site='S00')[source]#
visualize the restored tensor per site.
- Parameters
site (str, int, default ="S00") – Site/station name for
- Returns
``self`` – returns
selffor chaining methods.- Return type
watex.view.plot.TPlotinstanciated object
Examples
>>> from watex.view import TPlot >>> from watex.datasets import load_edis >>> edi_data = load_edis (return_data =True, samples =7) >>> plot_kws = dict( ylabel = '$Log_{10}Frequency [Hz]$', xlabel = '$Distance(m)$', cb_label = '$Log_{10}Rhoa[\Omega.m$]', fig_size =(7, 4), font_size =7. ) >>> t= TPlot(**plot_kws ).fit(edi_data) >>> # plot recovery of site 'S01' >>> t.plot_recovery ('S01')
- plot_rhoa(mode='TE', scale='period', site=None, seed=None, how='py', show_site=True, survey=None, style=None, errorbar=True, suppress_outliers=False, **kws)[source]#
Plot apparent resistivity and phase curves
- Parameters
mode (str, default='TE',) – Electromagnetic mode. Can be [‘TM’ |’both’]. If
both, components xy and yx are expected in the data.scale (str, default='period') – Visualization on axis labell. can be
'frequency'.site (int,str, optional) – index of name of the site to plot. site must be composed of a position number. For instance
'S13'. If not provided, a random station is selected instead.seed (int, optional) – If site is not provided, seed fetches randomly a site. To fetch the same sime everytimes, it is better to set the seed value.
how (str, default='py') – The way the site is fetched for plot. For instance, in Python indexing (default), the site is numbered from 0. For instance ‘site05’ will fetch the data at index 4. If this positioning is not wished, set to ‘None’.
show_site (bool, default=True,) – Display the number of site.
survey (str, optional) – Method used for the survey. e.g., ‘AMT’ for Audio-Magnetotellurics.
style (str, default='default') – Matplotlib style.
errorbar (bool, default=True) – display the error bar.
suppress_outliers (bool, default=False,) – Remove outliers in the data before plotting
kws (dict,) – Addfitional keywords arguments passed to Matplotlib.Axes.Scatter plots.
Examples
>>> import watex as wx >>> edi_data = wx.fetch_data ('edis', return_data =True, samples =27) >>> wx.TPlot(show_grid=True).fit(edi_data).plot_rhoa ( seed =52, mode ='*')
- plot_rhophi(sites=None, mode='TE', scale='period', seed=None, how='py', show_site=True, survey=None, style=None, errorbar=True, suppress_outliers=False, kind='2', n_sites=1, spad=0.5, **kws)[source]#
Plot resistivities and phases from multiples stations.
- Parameters
mode (str, default='TE',) – Electromagnetic mode. Can be [‘TM’ |’both’]. If
both, components xy and yx are expected in the data.sites (int,str, or list, optional) – A collection of index of name of the site . Each site must be composed of a position number. For instance
'S13'. If not provided, a random sites are selected instead using the n_sites parameter.scale (str, default='period') – Visualization on axis labell. can be
'frequency'.seed (int, optional) – If site is not provided, seed fetches randomly a site. To fetch the same sime everytimes, it is better to set the seed value.
how (str, default='py') – The way the site is fetched for plot. For instance, in Python indexing (default), the site is numbered from 0. For instance ‘site05’ will fetch the data at index 4. If this positioning is not wished, set to ‘None’.
show_site (bool, default=True,) – Display the number of site.
survey (str, optional) – Method used for the survey. e.g., ‘AMT’ for Audio-Magnetotellurics.
style (str, default='default') – Matplotlib style.
errorbar (bool, default=True) – display the error bar.
suppress_outliers (bool, default=False,) – Remove outliers in the data before plotting
n_sites (int, default =1.) – Number of random sites to select for visualizing. It cannot work if the names of sites are given.
spad (float, default=.5,) –
pad to display the station in the top of each section plot.
New in version 0.2.1.
kws (dict,) – Addfitional keywords arguments passed to Matplotlib.Axes.Scatter plots.
Examples
>>> import watex as wx >>> edi_data = wx.fetch_data ('edis', return_data =True, samples =27) >>> wx.TPlot(show_grid=True).fit(edi_data).plot_rhophi ( seed =52, mode ='*', n_sites =3 )
- plot_tensor2d(tensor='res', sites=None, to_log10=False)[source]#
Plot two dimensional tensor.
- Parameters
freqs (array-like) – y-coordinates. It should have the length N, the same of the
arr2d. the rows of thearr2d.Frequency array. It should be the complete frequency used during the survey area.tensor (str , ['res','phase', 'z'], default='res') – kind of tensor to plot. Can be resistivity or phase. If phase, customize your plot to not fit the default ‘res’ behaviour.
to_log10 (bool, defaut=False,) – Convert the resistivity data and frequeny in log10.
sites (list of str, optional) – List of stations/sites names. If given, it must have the same length of the positions in of the EDI data. Must fit the number of ‘EDI’ succesffully read.
- Returns
arr2d: 2D resistivity array from the tensor component
freqs: array-like 1d of frequency in the survey.
- positions: Sites/stations positions. It is equals to the distance
between stations times the number of sites
sites: list of the names of the station/sites
- base_plot_kws: plot keywords arguments inherits from
watex.property.BasePlot. It composes the last parameters for customizing plot as decorated return function.
- Return type
( arr2d , freqs, positions , sites , base_plot_kws)
Examples
>>> from watex.view.plot import TPlot >>> from watex.datasets import load_edis >>> # get some 3 samples of EDI for demo >>> edi_data = load_edis (return_data =True, samples =3 ) >>> # customize plot by adding plot_kws >>> plot_kws = dict( ylabel = '$Log_{10}Frequency [Hz]$', xlabel = '$Distance(m)$', cb_label = '$Log_{10}Rhoa[\Omega.m$]', fig_size =(6, 3), font_size =7. ) >>> t= TPlot(**plot_kws ).fit(edi_data) >>> # plot recovery2d using the log10 resistivity >>> t.plot_tensor2d (to_log10=True) <AxesSubplot:xlabel='$Distance(m)$', ylabel='$Log_{10}Frequency [Hz]$'>
- watex.view.biPlot(self, Xr, components, y, classes=None, markers=None, colors=None)[source]#
The biplot is the best way to visualize all-in-one following a PCA analysis.
There is an implementation in R but there is no standard implementation in Python.
- Parameters
self (
watex.property.BasePlot.) –Matplotlib property from BasePlot instances. Default BasePlot instance is given as a pobj instance and can be loaded for plotting purpose as:
>>> from watex.view import pobj
To change some default plot properties like line width or style, both can be set before running the script as follow
>>> pobj.lw = 2. ; pobj.ls=':' # and so on
Xr (NDArray of transformed X.) – the PCA projected data scores on n-given components.The reduced dimension of train set ‘X’ with maximum ratio as sorted eigenvectors from first to the last component.
components (NDArray, shape (n_components, n_eigenvectors ),) – the eigenvectors of the PCA. The shape in axis must much the number of component computed using PCA. If the Xr shape 1 equals to the shape 0 of the component matrix components, it will be transposed to fit Xr shape 1.
y (Array-like,) – the target composing the class labels.
classes (list or int,) – class categories or class labels
markers (str,) – Matplotlib list of markers for plotting classes.
colors (str,) – Matplotlib list of colors to customize plots
Examples
>>> from watex.analysis import nPCA >>> from watex.datasets import fetch_data >>> from watex.view import biPlot, pobj # pobj is Baseplot instance >>> X, y = fetch_data ('bagoue pca' ) # fetch pca data >>> pca= nPCA (X, n_components= 2 , return_X= False ) # return PCA object >>> components = pca.components_ [:2, :] # for two components >>> biPlot (pobj, pca.X, components , y ) # pca.X is the reduced dim X >>> # to change for instance line width (lw) or style (ls) >>> # just use the baseplotobject (pobj)
References
Originally written by Serafeim Loukas, serafeim.loukas@epfl.ch and was edited to fit the watex package API.
- watex.view.plot2d(ar, y=None, x=None, distance=50.0, stnlist=None, prefix='S', how='py', to_log10=False, plot_contours=False, top_label='', **baseplot_kws)[source]#
Two dimensional template for visualization matrices.
It is a wrappers that can plot any matrice by customizing the position X and y. By default X is considering as stations and y the resistivity log data.
- Parameters
ar (Array-like 2D, shape (M, N)) – 2D array for plotting. For instance, it can be a 2D resistivity collected at all stations (N) and all frequency (M)
y (array-like, default=None) – Y-coordinates. It should have the length N, the same of the
arr2d. the rows of thearr2d.x (array-like, default=None,) – X-coordinates. It should have the length M, the same of the
arr2d; the columns of the 2D dimensional array. Note that if x is given, the `distance is not needed.distance (float) – The step between two stations. If given, it creates an array of position for plotting purpose. Default value is
50meters.stnlist (list of str) – List of stations names. If given, it should have the same length of the columns M, of arr2d`
prefix (str) – string value to add as prefix of given id. Prefix can be the site name. Default is
S.how (str) – Mode to index the station. Default is ‘Python indexing’ i.e. the counting of stations would starts by 0. Any other mode will start the counting by 1.
to_log10 (bool, default=False) – Recompute the ar in logarithm base 10 values. Note when
True, theyshould be also in log10.plot_contours (bool, default=True) – Plot the contours map. Is available only if the plot_style is set to
pcolormesh.baseplot_kws (dict,) – All all the keywords arguments passed to the property
watex.property.BasePlotclass.
- Returns
axe
- Return type
<AxesSubplot> object
Examples
>>> import numpy as np >>> import watex >>> np.random.seed (42) >>> data = np.random.randn ( 15, 20 ) >>> data_nan = data.copy() >>> data_nan [2, 1] = np.nan; data_nan[4, 2]= np.nan; data_nan[6, 3]=np.nan >>> watex.view.mlplot.plot2d (data ) <AxesSubplot:xlabel='Distance(m)', ylabel='log10(Frequency)[Hz]'> >>> watex.view.mlplot.plot2d (data_nan , plt_style = 'imshow', fig_size = (10, 4))
- watex.view.plotDendrogram(df, columns=None, labels=None, metric='euclidean', method='complete', kind=None, return_r=False, verbose=False, **kwd)[source]#
Visualizes the linkage matrix in the results of dendrogram.
Note that the categorical features if exist in the dataframe should automatically be discarded.
- Parameters
df (dataframe or NDArray of (n_samples, n_features)) – dataframe of Ndarray. If array is given , must specify the column names to much the array shape 1
columns (list) – list of labels to name each columns of arrays of (n_samples, n_features) If dataframe is given, don’t need to specify the columns.
kind (str, ['squareform'|'condense'|'design'], default is {'design'}) – kind of approach to summing up the linkage matrix. Indeed, a condensed distance matrix is a flat array containing the upper triangular of the distance matrix. This is the form that
pdistreturns. Alternatively, a collection of \(m\) observation vectors in \(n\) dimensions may be passed as an \(m\) by \(n\) array. All elements of the condensed distance matrix must be finite, i.e., no NaNs or infs. Alternatively, we could used thesquareformdistance matrix to yield different distance values than expected. thedesignapproach uses the complete inpout example matrix also called ‘design matrix’ to lead correct linkage matrix similar to squareform and condense`.metric (str or callable, default is {'euclidean'}) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by
sklearn.metrics.pairwise.pairwise_distances(). IfXis the distance array itself, use “precomputed” as the metric. Precomputed distance matrices must have 0 along the diagonal.method (str, optional, default is {'complete'}) – The linkage algorithm to use. See the
Linkage Methodssection below for full descriptions inwatex.utils.exmath.linkage_matrix()labels (ndarray, optional) – By default,
labelsis None so the index of the original observation is used to label the leaf nodes. Otherwise, this is an \(n\)-sized sequence, withn == Z.shape[0] + 1. Thelabels[i]value is the text to put under the \(i\) th leaf node only if it corresponds to an original observation and not a non-singleton cluster.return_r (bool, default='False',) – return r-dictionnary if set to ‘True’ otherwise returns nothing
verbose (int, bool, default='False') – If
True, output message of the name of categorical features dropped.kwd (dict) – additional keywords arguments passes to
scipy.cluster.hierarchy.dendrogram()
- Returns
r – A dictionary of data structures computed to render the dendrogram. Its has the following keys:
'color_list'A list of color names. The k’th element represents the color of the k’th link.
'icoord'and'dcoord'Each of them is a list of lists. Let
icoord = [I1, I2, ..., Ip]whereIk = [xk1, xk2, xk3, xk4]anddcoord = [D1, D2, ..., Dp]whereDk = [yk1, yk2, yk3, yk4], then the k’th link painted is(xk1, yk1)-(xk2, yk2)-(xk3, yk3)-(xk4, yk4).'ivl'A list of labels corresponding to the leaf nodes.
'leaves'For each i,
H[i] == j, cluster nodejappears in positioniin the left-to-right traversal of the leaves, where \(j < 2n-1\) and \(i < n\). Ifjis less thann, thei-th leaf node corresponds to an original observation. Otherwise, it corresponds to a non-singleton cluster.'leaves_color_list'A list of color names. The k’th element represents the color of the k’th leaf.
- Return type
dict
Examples
>>> from watex.datasets import load_iris >>> from watex.view import plotDendrogram >>> data = load_iris () >>> X =data.data[:, :2] >>> plotDendrogram (X, columns =['X1', 'X2' ] )
- watex.view.plotDendroheat(df, columns=None, labels=None, metric='euclidean', method='complete', kind='design', cmap='hot_r', fig_size=(8, 8), facecolor='white', **kwd)[source]#
Attaches dendrogram to a heat map.
Hierachical dendrogram are often used in combination with a heat map which allows us to represent the individual value in data array or matrix containing our training examples with a color code.
- Parameters
df (dataframe or NDArray of (n_samples, n_features)) – dataframe of Ndarray. If array is given , must specify the column names to much the array shape 1
columns (list) – list of labels to name each columns of arrays of (n_samples, n_features) If dataframe is given, don’t need to specify the columns.
kind (str, ['squareform'|'condense'|'design'], default is {'design'}) – kind of approach to summing up the linkage matrix. Indeed, a condensed distance matrix is a flat array containing the upper triangular of the distance matrix. This is the form that
pdistreturns. Alternatively, a collection of \(m\) observation vectors in \(n\) dimensions may be passed as an \(m\) by \(n\) array. All elements of the condensed distance matrix must be finite, i.e., no NaNs or infs. Alternatively, we could used thesquareformdistance matrix to yield different distance values than expected. thedesignapproach uses the complete inpout example matrix also called ‘design matrix’ to lead correct linkage matrix similar to squareform and condense`.metric (str or callable, default is {'euclidean'}) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by
sklearn.metrics.pairwise.pairwise_distances(). IfXis the distance array itself, use “precomputed” as the metric. Precomputed distance matrices must have 0 along the diagonal.method (str, optional, default is {'complete'}) – The linkage algorithm to use. See the
Linkage Methodssection below for full descriptions inwatex.utils.exmath.linkage_matrix()labels (ndarray, optional) – By default,
labelsis None so the index of the original observation is used to label the leaf nodes. Otherwise, this is an \(n\)-sized sequence, withn == Z.shape[0] + 1. Thelabels[i]value is the text to put under the \(i\) th leaf node only if it corresponds to an original observation and not a non-singleton cluster.cmap (str , default is {'hot_r'}) – matplotlib color map
fig_size (str , Tuple , default is {(8, 8)}) – the size of the figure
facecolor (str , default is {"white"}) – Matplotlib facecolor
kwd (dict) – additional keywords arguments passes to
scipy.cluster.hierarchy.dendrogram()
Examples
>>> # (1) -> Use random data >>> import numpy as np >>> from watex.view.mlplot import plotDendroheat >>> np.random.seed(123) >>> variables =['X', 'Y', 'Z'] ; labels =['ID_0', 'ID_1', 'ID_2', 'ID_3', 'ID_4'] >>> X= np.random.random_sample ([5,3]) *10 >>> df =pd.DataFrame (X, columns =variables, index =labels) >>> plotDendroheat (df) >>> # (2) -> Use Bagoue data >>> from watex.datasets import load_bagoue >>> X, y = load_bagoue (as_frame=True ) >>> X =X[['magnitude', 'power', 'sfi']].astype(float) # convert to float >>> plotDendroheat (X )
- watex.view.plotLearningInspection(model, X, y, axes=None, ylim=None, cv=5, n_jobs=None, train_sizes=None, display_legend=True, title=None)[source]#
Inspect model from its learning curve.
Generate 3 plots: the test and training learning curve, the training samples vs fit times curve, the fit times vs score curve.
- Parameters
model (estimator instance) – An estimator instance implementing fit and predict methods which will be cloned for each validation.
title (str) – Title for the chart.
X (array-like of shape (n_samples, n_features)) – Training vector, where
n_samplesis the number of samples andn_featuresis the number of features.y (array-like of shape (n_samples) or (n_samples, n_features)) – Target relative to
Xfor classification or regression; None for unsupervised learning.axes (array-like of shape (3,), default=None) – Axes to use for plotting the curves.
ylim (tuple of shape (2,), default=None) – Defines minimum and maximum y-values plotted, e.g. (ymin, ymax).
cv (int, cross-validation generator or an iterable, default=None) –
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross-validation,
integer, to specify the number of folds.
CV splitter,
An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if
yis binary or multiclass,StratifiedKFoldused. If the estimator is not a classifier or ifyis neither binary nor multiclass,KFoldis used.Refer User Guide for the various cross-validators that can be used here.
n_jobs (int or None, default=None) – Number of jobs to run in parallel.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. See Glossary for more details.train_sizes (array-like of shape (n_ticks,)) – Relative or absolute numbers of training examples that will be used to generate the learning curve. If the
dtypeis float, it is regarded as a fraction of the maximum size of the training set (that is determined by the selected validation method), i.e. it has to be within (0, 1]. Otherwise it is interpreted as absolute sizes of the training sets. Note that for classification the number of samples usually have to be big enough to contain at least one sample from each class. (default: np.linspace(0.1, 1.0, 5))display_legend (bool, default ='True') – display the legend
- Returns
axes
- Return type
Matplotlib axes
Examples
>>> from watex.datasets import fetch_data >>> from watex.models import p >>> from watex.view.mlplot import plotLearningInspection >>> # import sparse matrix from Bagoue datasets >>> X, y = fetch_data ('bagoue prepared') >>> # import the pretrained Radial Basis Function (RBF) from SVM >>> plotLearningInspection (p.SVM.rbf.best_estimator_ , X, y )
- watex.view.plotLearningInspections(models, X, y, fig_size=(22, 18), cv=None, savefig=None, titles=None, subplot_kws=None, **kws)[source]#
Inspect multiple models from their learning curves.
Mutiples Inspection plots that generate the test and training learning curve, the training samples vs fit times curve, the fit times vs score curve for each model.
- Parameters
models (list of estimator instances) – Each estimator instance implements fit and predict methods which will be cloned for each validation.
X (array-like of shape (n_samples, n_features)) – Training vector, where
n_samplesis the number of samples andn_featuresis the number of features.y (array-like of shape (n_samples) or (n_samples, n_features)) – Target relative to
Xfor classification or regression; None for unsupervised learning.cv (int, cross-validation generator or an iterable, default=None) –
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross-validation,
integer, to specify the number of folds.
CV splitter,
An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if
yis binary or multiclass,StratifiedKFoldused. If the estimator is not a classifier or ifyis neither binary nor multiclass,KFoldis used.Refer Sckikit-learn User Guide for the various cross-validators that can be used here.
savefig (str, default =None ,) – the path to save the figures. Argument is passed to matplotlib.Figure class.
titles (str, list) – List of model names if changes are needed. If
None, model names are used by default.kws (dict,) – Additional keywords argument passed to
plotLearningInspection().
- Returns
axes
- Return type
Matplotlib axes
See also
plotLearningInspectionInspect single model
Examples
>>> from watex.datasets import fetch_data >>> from watex.models.premodels import p >>> from watex.view.mlplot import plotLearningInspections >>> # import sparse matrix from Bagoue dataset >>> X, y = fetch_data ('bagoue prepared') >>> # import the two pretrained models from SVM >>> models = [p.SVM.rbf.best_estimator_ , p.SVM.poly.best_estimator_] >>> plotLearningInspections (models , X, y, ylim=(0.7, 1.01) )
- watex.view.plotModel(yt, ypred=None, *, clf=None, Xt=None, predict=False, prefix=None, index=None, fill_between=False, labels=None, return_ypred=False, **baseplot_kws)[source]#
- Plot model ‘y’ (true labels) versus ‘ypred’ (predicted) from test
data.
Plot will allow to know where estimator/classifier fails to predict correctly the target
- Parameters
- yt:array-like, shape (M, ) ``M=m-samples``,
test target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
- ypred:array-like, shape (M, ) ``M=m-samples``
Array of the predicted labels. It has the same number of samples as the test data ‘Xt’
- clf :callable, always as a function, classifier estimator
A supervised predictor with a finite set of discrete possible output values. A classifier must supports modeling some of binary, targets. It must store a classes attribute after fitting.
- Xt: Ndarray ( M x N matrix where ``M=m-samples``, & ``N=n-features``)
Shorthand for “test set”; data that is observed at testing and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix.
- prefix: str, optional
litteral string to prefix the samples/examples considered as tick labels in the abscissa. For instance:
index =[0, 2, 4, 7] prefix ='b' --> index =['b0', 'b2', 'b4', 'b7']
- predict: bool, default=False,
Expected to be ‘True’ when user want to predict the array ‘ypred’ and plot at the same time. Otherwise, can be set to ‘False’ and use the’ypred’ data already predicted. Note that, if ‘True’, an estimator/classifier must be provided as well as the test data ‘Xt’, otherwise an error will occur.
- index: array_like, optional
list integer values or string expected to be the index of ‘Xt’ and ‘yt’ turned into pandas dataframe and series respectively. Note that one of them has already and index and new index is given, the latter must be consistent. This is usefull when data are provided as ndarray rathern than a dataframe.
- fill_between: bool
Fill a line between the actual classes i.e the true labels.
- labels: list of str or int, Optional
list of labels names to hold the name of each category.
- return_pred: bool,
return predicted ‘ypred’ if ‘True’ else nothing.
- baseplot_kws: dict,
All all the keywords arguments passed to the peroperty
watex.property.BasePlotclass.
(2)-> prepared our demo estimator and plot model predicted
>>> svc_clf = SVC(C=100, gamma=1e-2, kernel='rbf', random_state =42) >>> base_plot_params ={ 'lw' :3., # line width 'lc':(.9, 0, .8), 'ms':7., 'yp_marker' :'o', 'fig_size':(12, 8), 'font_size':15., 'xlabel': 'Test examples', 'ylabel':'Flow categories' , 'marker':'o', 'markeredgecolor':'k', 'markerfacecolor':'b', 'markeredgewidth':3, 'yp_markerfacecolor' :'k', 'yp_markeredgecolor':'r', 'alpha' :1., 'yp_markeredgewidth':2., 'show_grid' :True, 'galpha' :0.2, 'glw':.5, 'rotate_xlabel' :90., 'fs' :3., 's' :20 , 'rotate_xlabel':90 } >>> plotModel(yt= ytest , Xt=Xtest , predict =True , # predict the result (estimator fit) clf=svc_clf , fill_between= False, prefix ='b', labels=['FR0', 'FR1', 'FR2', 'FR3'], # replace 'y' labels. **base_plot_params ) >>> # plot show where the model failed to predict the target 'yt'
- watex.view.plotProjection(X, Xt=None, *, columns=None, test_kws=None, **baseplot_kws)[source]#
Visualize train and test dataset based on the geographical coordinates.
Since there is geographical information(latitude/longitude or easting/northing), it is a good idea to create a scatterplot of all instances to visualize data.
- Parameters
X (Ndarray ( M x N matrix where
M=m-samples, &N=n-features)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.Xt (Ndarray ( M x N matrix where
M=m-samples, &N=n-features)) – Shorthand for “test set”; data that is observed at testing and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix.columns (list of str or index, optional) – columns is usefull when a dataframe is given with a dimension size greater than 2. If such data is passed to X or Xt, columns must hold the name to considered as ‘easting’, ‘northing’ when UTM coordinates are given or ‘latitude’ , ‘longitude’ when latlon are given. If dimension size is greater than 2 and columns is None , an error will raises to prevent the user to provide the index for ‘y’ and ‘x’ coordinated retrieval.
test_kws (dict,) – keywords arguments passed to
matplotlib.plot.scatter()as test location font and colors properties.baseplot_kws (dict,) – All all the keywords arguments passed to the peroperty
watex.property.BasePlotclass.
Examples
>>> from watex.datasets import fetch_data >>> from watex.view.mlplot import plotProjection >>> # Discard all the non-numeric data >>> # then inut numerical data >>> from watex.utils import to_numeric_dtypes, naive_imputer >>> X, Xt, *_ = fetch_data ('bagoue', split_X_y =True, as_frame =True) >>> X =to_numeric_dtypes(X, pop_cat_features=True ) >>> X= naive_imputer(X) >>> Xt = to_numeric_dtypes(Xt, pop_cat_features=True ) >>> Xt= naive_imputer(Xt) >>> plot_kws = dict (fig_size=(8, 12), lc='k', marker='o', lw =3., font_size=15., xlabel= 'easting (m) ', ylabel='northing (m)' , markerfacecolor ='k', markeredgecolor='r', alpha =1., markeredgewidth=2., show_grid =True, galpha =0.2, glw=.5, rotate_xlabel =90., fs =3., s =None ) >>> plotProjection( X, Xt , columns= ['east', 'north'], trainlabel='train location', testlabel='test location', **plot_kws )
- watex.view.plotSilhouette(X, labels=None, prefit=True, n_clusters=3, n_init=10, max_iter=300, random_state=None, tol=10000.0, metric='euclidean', **kwd)[source]#
quantifies the quality of clustering samples.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous. If a sparse matrix is passed, a copy will be made if it’s not in CSR format.
labels (array-like 1d of shape (n_samples,)) – Label values for each sample.
n_clusters (int, default=8) – The number of clusters to form as well as the number of centroids to generate.
prefit (bool, default=False) – Whether a prefit labels is expected to be passed into the function directly or not. If True, labels must be a fit predicted values target. If False, labels is fitted and updated from X by calling fit_predict methods. Any other values passed to labels is discarded.
n_init (int, default=10) – Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
max_iter (int, default=300) – Maximum number of iterations of the k-means algorithm for a single run.
tol (float, default=1e-4) – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.
verbose (int, default=0) – Verbosity mode.
random_state (int, RandomState instance or None, default=42) – Determines random number generation for centroid initialization. Use an int to make the randomness deterministic.
tol – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.
metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by
sklearn.metrics.pairwise.pairwise_distances(). IfXis the distance array itself, use “precomputed” as the metric. Precomputed distance matrices must have 0 along the diagonal.**kwds (optional keyword parameters) – Any further parameters are passed directly to the distance function. If using a
scipy.spatial.distancemetric, the parameters are still metric dependent. See the scipy docs for usage examples.
Note
The sihouette coefficient is bound between -1 and 1
- watex.view.plot_matshow(arr, /, labelx=None, labely=None, matshow_kws=None, **baseplot_kws)[source]#
Quick matrix visualization using matplotlib.pyplot.matshow.
- Parameters
arr (2D ndarray,) – matrix of n rowns and m-columns items
matshow_kws (dict) – Additional keywords arguments for
matplotlib.axes.matshow()labelx (list of str, optional) – list of labels names that express the name of each category on x-axis. It might be consistent with the matrix number of columns of arr.
label (list of str, optional) – list of labels names that express the name of each category on y-axis. It might be consistent with the matrix number of row of arr.
Examples
>>> import numpy as np >>> from watex.view.mlplot import plot_matshow >>> matshow_kwargs ={ 'aspect': 'auto', 'interpolation': None, 'cmap':'copper_r', } >>> baseplot_kws ={'lw':3, 'lc':(.9, 0, .8), 'font_size':15., 'cb_format':None, #'cb_label':'Rate of prediction', 'xlabel': 'Predicted flow classes', 'ylabel': 'Geological rocks', 'font_weight':None, 'tp_labelbottom':False, 'tp_labeltop':True, 'tp_bottom': False } >>> labelx =['FR0', 'FR1', 'FR2', 'FR3', 'Rates'] >>> labely =['VOLCANO-SEDIM. SCHISTS', 'GEOSYN. GRANITES', 'GRANITES', '1.0', 'Rates'] >>> array2d = np.array([(1. , .5, 1. ,1., .9286), (.5, .8, 1., .667, .7692), (.7, .81, .7, .5, .7442), (.667, .75, 1., .75, .82), (.9091, 0.8064, .7, .8667, .7931)]) >>> plot_matshow(array2d, labelx, labely, matshow_kwargs,**baseplot_kws )
- watex.view.plot_model_scores(models, scores=None, cv_size=None, **baseplot_kws)[source]#
uses the cross validation to get an estimation of model performance generalization.
It Visualizes model fined tuned scores vs the cross validation
- Parameters
models (list of callables, always as a functions,) –
list of estimator names can also be a pair estimators and validations scores.For instance estimators and scores can be arranged as:
models =[('SVM', scores_svm), ('LogRegress', scores_logregress), ...]
If that arrangement is passed to models parameter then no need to pass the score values of each estimators in scores. Note that a model is an object which manages the estimation and decoding. The model is estimated as a deterministic function of:
parameters provided in object construction or with set_params;
- the global numpy.random random state if the estimator’s random_state
parameter is set to None; and
- any data or sample properties passed to the most recent call to fit,
fit_transform or fit_predict, or data similarly passed in a sequence of calls to partial_fit.
list of estimators names or a pairs estimators and validations scores. For instance:
clfs =[('SVM', scores_svm), ('LogRegress', scores_logregress), ...]
scores (array like) –
list of scores on different validation sets. If scores are given, set only the name of the estimators passed to models like:
models =['SVM', 'LogRegress', ...] scores=[scores_svm, scores_logregress, ...]
cv_size (float or int,) – The number of fold used for validation. If different models have different cross validation values, the minimum size of cross validation is used and the scored of each model is resized to match the minimum size number.
baseplot_kws (dict,) – All all the keywords arguments passed to the peroperty
watex.property.BasePlotclass.
Examples
(1) -> Score is appended to the model >>> from watex.exlib.sklearn import SVC >>> from watex.view.mlplot import plot_model_scores >>> import numpy as np >>> svc_model = SVC() >>> fake_scores = np.random.permutation (np.arange (0, 1, .05)) >>> plot_model_scores([(svc_model, fake_scores )]) … (2) -> Use model and score separately
>>> plot_model_scores([svc_model],scores =[fake_scores] )# >>> # customize plot by passing keywords properties >>> base_plot_params ={ 'lw' :3., 'lc':(.9, 0, .8), 'ms':7., 'fig_size':(12, 8), 'font_size':15., 'xlabel': 'samples', 'ylabel':'scores' , 'marker':'o', 'alpha' :1., 'yp_markeredgewidth':2., 'show_grid' :True, 'galpha' :0.2, 'glw':.5, 'rotate_xlabel' :90., 'fs' :3., 's' :20 , 'sns_style': 'darkgrid', } >>> plot_model_scores([svc_model],scores =[fake_scores] , **base_plot_params )
- watex.view.plot_reg_scoring(reg, X, y, test_size=None, random_state=42, scoring='mse', return_errors=False, **baseplot_kws)[source]#
Plot regressor learning curves using root-mean squared error scorings.
Use the hold-out cross-validation technique for score evaluation [1].
- Parameters
reg (callable, always as a function) – A regression estimator; Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator. The estimated model is stored in public and private attributes on the estimator instance, facilitating decoding through prediction and transformation methods. The core functionality of some estimators may also be available as a
function.X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.
Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
scoring (str, ['mse'|'rmse'], default ='mse') – kind of error to visualize on the regression learning curve.
test_size (float or int, default=None) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If
train_sizeis also None, it will be set to 0.25.random_state (int, RandomState instance or None, default=None) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls..
return_errors (bool, default='False') – returns training eror and validation errors.
baseplot_kws (dict,) – All all the keywords arguments passed to the peroperty
watex.property.BasePlotclass.
- Returns
(train_errors, val_errors) – training score and validation scores if return_errors is set to
True, otherwise returns nothing- Return type
Tuple,
Examples
>>> from watex.datasets import fetch_data >>> from watex.view.mlplot import plot_reg_scoring >>> # Note that for the demo, we import SVC rather than LinearSVR since the >>> # problem of Bagoue dataset is a classification rather than regression. >>> # if use regression instead, a convergence problem will occurs. >>> from watex.exlib.sklearn import SVC >>> X, y = fetch_data('bagoue analysed')# got the preprocessed and imputed data >>> svm =SVC() >>> t_errors, v_errors =plot_reg_scoring(svm, X, y, return_errors=True)
Notes
The hold-out technique is the classic and most popular approach for estimating the generalization performance of the machine learning. The dataset is splitted into training and test sets. The former is used for the model training whereas the latter is used for model performance evaluation. However in typical machine learning we are also interessed in tuning and comparing different parameter setting for futher improve the performance for the name refering to the given classification or regression problem for which we want the optimal values of tuning the hyperparameters. Thus, reusing the same datset over and over again during the model selection is not recommended since it will become a part of the training data and then the model will be more likely to overfit. From this issue, the hold-out cross validation is not a good learning practice. A better way to use the hold-out method is to separate the data into three parts such as the traing set, the the validation set and the test dataset. See more in [2].
References
- 1
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., et al. (2011) Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12, 2825–2830.
- 2
Raschka, S. & Mirjalili, V. (2019) Python Machine Learning. (J. Malysiak, S. Jain, J. Lovell, C. Nelson, S. D’silva & R. Atitkar, Eds.), 3rd ed., Packt.
- watex.view.pobj#
alias of
Plot
- watex.view.viewtemplate(y, /, xlabel=None, ylabel=None, **kws)[source]#
Quick view template
- Parameters
y (Arraylike , shape (N, )) –
xlabel (str, Optional) – Label for naming the x-abscissia
ylabel (str, Optional,) – Label for naming the y-coordinates.
kws (dict,) – keywords argument passed to
matplotlib.pyplot.plot()