class watex.base.AdalineGradientDescent(eta=0.01, n_iter=50, random_state=42)[source]#

Bases: _Base

Adaptative Linear Neuron Classifier

ADAptative LInear NEuron (Adaline) was published by Bernard Widrow and his doctoral studentTeed Hoff only a few uears after Rosenblatt’s perceptron algorithm. It can be considered as impovrment of the latter Windrow and al., 1960.

Adaline illustrates the key concepts of defining and minimizing continuous cost function. This lays the groundwork for understanding more advanced machine learning algorithm for classification, such as Logistic Regression, Support Vector Machines,and Regression models.

The key difference between Adaline rule (also know as the WIdrow-Hoff rule) and Rosenblatt’s perceptron is that the weights are updated based on linear activation function rather than unit step function like in the perceptron. In Adaline, this linear activation function \(\phi(z)\) is simply the identifu function of the net input so that:

\[\phi (w^Tx)= w^Tx\]

while the linear activation function is used for learning the weights.

Parameters:
  • eta (float,) – Learning rate between (0. and 1.)

  • n_iter (int ,) – number of iteration passes over the training set

  • random_state (int, default is 42) – random number generator seed for random weight initialization.

w_#

Weight after fitting

Type:

Array-like,

cost_#

Sum of squares cost function (updates ) in each epoch

Type:

list

References

[1]

Windrow and al., 1960. An Adaptative “Adeline” Neuron Using Chemical “Memistors”, Technical reports Number, 1553-2,B Windrow and al., standford Electron labs, Standford, CA,October 1960.

activation(X)[source]#

Compute the linear activation

Parameters:

X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Returns:

X

Return type:

activate NDArray

fit(X, y)[source]#

Fit the training data

Parameters:
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like, shape (M, ) M=m-samples,) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

Returns:

self – returns self for easy method chaining.

Return type:

Perceptron instance

property inspect#

Inspect object whether is fitted or not

net_input(X)[source]#

Compute the net input X

Parameters:
X: Ndarray ( M x N matrix where ``M=m-samples``, & ``N=n-features``)

Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Returns:
weight net inputs
predict(X)[source]#

Predict the class label after unit step

Parameters:

X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Returns:

ypred

Return type:

predicted class label after the unit step (1, or -1)

class watex.base.AdalineStochasticGradientDescent(eta=0.01, n_iter=50, shuffle=True, random_state=42)[source]#

Bases: _Base

Adaptative Linear Neuron Classifier with batch (stochastic) gradient descent

A stochastic gradient descent is a popular alternative algorithm which is sometimes also called iterative or online gradient descent [1]. It updates the weights based on the sum of accumulated errors over all training examples \(x^{(i)}\):

\[\delta w: \sum{i} (y^{(i)} -\phi( z^{(i)}))x^(i)\]

the weights are updated incremetally for each training examples:

\[\eta(y^{(i)} - \phi(z^{(i)})) x^{(i)}\]
Parameters:
  • eta (float,) – Learning rate between (0. and 1.)

  • n_iter (int,) – number of iteration passes over the training set

  • suffle (bool,) – shuffle training data every epoch if True to prevent cycles.

  • random_state (int, default is 42) – random number generator seed for random weight initialization.

w_#

Weight after fitting

Type:

Array-like,

cost_#

Sum of squares cost function (updates ) in each epoch

Type:

list

See also

AdelineGradientDescent

AdalineGradientDescent

References

[1]

Windrow and al., 1960. An Adaptative “Adaline” Neuron Using Chemical “Memistors”, Technical reports Number, 1553-2,B Windrow and al., standford Electron labs, Standford, CA,October 1960.

activation(X)[source]#

Compute the linear activation

Parameters:

X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Returns:

X

Return type:

activate NDArray

fit(X, y)[source]#

Fit the training data

Parameters:
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like, shape (M, ) M=m-samples,) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

Returns:

self – returns self for easy method chaining.

Return type:

Perceptron instance

property inspect#

Inspect object whether is fitted or not

net_input(X)[source]#

Compute the net input X

Parameters:

X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Return type:

weight net inputs

partial_fit(X, y)[source]#

Fit training data without reinitialising the weights

Parameters:
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like, shape (M, ) M=m-samples,) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

Returns:

self – returns self for easy method chaining.

Return type:

Perceptron instance

predict(X)[source]#

Predict the class label after unit step

Parameters:

X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Returns:

ypred

Return type:

predicted class label after the unit step (1, or -1)

class watex.base.Data(verbose=0)[source]#

Bases: object

Data base class

Typically, we train a model with a matrix of data. Note that pandas Dataframe is the most used because it is very nice to have columns lables even though Numpy arrays work as well.

For supervised Learning for instance, suc as regression or clasification, our intent is to have a function that transforms features into a label. If we were to write this as an algebra formula, it would be look like:

\[y = f(X)\]

X is a matrix. Each row represent a sample of data or information about individual. Every columns in X is a feature.The output of our function, y, is a vector that contains labels (for classification) or values (for regression).

In Python, by convention, we use the variable name X to hold the sample data even though the capitalization of variable is a violation of standard naming convention (see PEP8).

Parameters:
  • data (str, filepath_or_buffer or pandas.core.DataFrame) – Path -like object or Dataframe. If data is given as path-like object, data is read, asserted and validated. Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be a file://localhost/path/to/table.csv. If you want to pass in a path object, pandas accepts any os.PathLike. By file-like object, we refer to objects with a read() method, such as a file handle e.g. via builtin open function or StringIO.

  • columns (str or list of str) – columns to replace which contain the missing data. Can use the axis equals to ‘1’.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Determine if rows or columns which contain missing values are removed. * 0, or ‘index’ : Drop rows which contain missing values. * 1, or ‘columns’ : Drop columns which contain missing value. Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

  • sample (int, Optional,) – Number of row to visualize or the limit of the number of sample to be able to see the patterns. This is usefull when data is composed of many rows. Skrunked the data to keep some sample for visualization is recommended. None plot all the samples ( or examples) in the data

  • kind (str, Optional) –

    type of visualization. Can be dendrogramm, mbar or bar. corr plot for dendrogram , msno bar, plt and msno correlation visualization respectively:

    • bar plot counts the nonmissing data using pandas

    • mbar use the msno package to count the number

      of nonmissing data.

    • dendrogram`` show the clusterings of where the data is missing.

      leaves that are the same level predict one onother presence (empty of filled). The vertical arms are used to indicate how different cluster are. short arms mean that branch are similar.

    • ``corr` creates a heat map showing if there are correlations

      where the data is missing. In this case, it does look like the locations where missing data are corollated.

    • None is the default vizualisation. It is useful for viewing

      contiguous area of the missing data which would indicate that the missing data is not random. The matrix function includes a sparkline along the right side. Patterns here would also indicate non-random missing data. It is recommended to limit the number of sample to be able to see the patterns.

    Any other value will raise an error.

  • inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.

  • verbose (int, default is 0) – Control the level of verbosity. Higher value lead to more messages.

Returns:

self – returns self for easy method chaining.

Return type:

Baseclass instance

Examples

property data#

return verified data

property describe#

Get summary stats as well as see the cound of non-null data. Here is the default behaviour of the method i.e. it is to only report on numeric columns. To have have full control, do it manually by yourself.

drop(labels=None, columns=None, inplace=False, axis=0, **kws)[source]#

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level.

Parameters:
  • labels (single label or list-like) – Index or column labels to drop. A tuple will be used as a single label and not treated as a list-like.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) – Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).

  • columns (single label or list-like) – Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels)

  • kws (dict,) – Additionnal keywords arguments passed to pd.DataFrame.drop().

Returns:

DataFrame without the removed index or column labels or None if inplace equsls to True.

Return type:

DataFrame or None

fit(data=None)[source]#

Read, assert and fit the data.

Parameters:

data (Dataframe or shape (M, N) from pandas.DataFrame) – Dataframe containing samples M and features N

Returns:

Returns self for easy method chaining.

Return type:

Data instance

property inspect#

Inspect data and trigger plot after checking the data entry. Raises NotFittedError if ExPlot is not fitted yet.

merge()[source]#

Merge two series whatever the type with operator &&.

When series as dtype object as non numeric values, dtypes should be change into a object

profilingReport(data=None, **kwd)[source]#

Generate a report in a notebook.

It will summarize the types of the columns and allow yuou to view details of quatiles statistics, a histogram, common values and extreme values.

Parameters:

data (Dataframe or shape (M, N) from pandas.DataFrame) – Dataframe containing samples M and features N

Returns:

Returns self for easy method chaining.

Return type:

Data instance

Examples

>>> from watex.base import Data
>>> Data().fit(data).profilingReport()
rename(data=None, columns=None, pattern=None)[source]#

rename columns of the dataframe with columns in lowercase and spaces replaced by underscores.

Parameters:
  • data (Dataframe of shape (M, N) from pandas.DataFrame) – Dataframe containing samples M and features N

  • columns (str or list of str, Optional) – the specific columns in dataframe to renames. However all columns is put in lowercase. If columns not in dataframe, error raises.

  • pattern (str, Optional,) – Regular expression pattern to strip the data. By default, the pattern is '[ -@*#&+/]'.

Returns:

``self`` – returns self for easy method chaining.

Return type:

Data instance

shrunk(columns, data=None, **kwd)[source]#

Reduce the data with importance features

Parameters:
  • data (Dataframe or shape (M, N) from pandas.DataFrame) – Dataframe containing samples M and features N

  • columns (str or list of str) – Columns or features to keep in the datasets

  • kwd (dict,)

:param additional keywords arguments from watex.utils.mlutils.selectfeatures():

Returns:

Returns self for easy method chaining.

Return type:

Data instance

class watex.base.GreedyPerceptron(eta=0.01, n_iter=50, random_state=42)[source]#

Bases: _Base

Perceptron classifier

Inspired from Rosenblatt concept of perceptron rules. Indeed, Rosenblatt published the first concept of perceptron learning rule based on the MCP (McCulloth-Pitts) neuron model. With the perceptron rule, Rosenblatt proposed an algorithm thar would automatically learn the optimal weights coefficients that would them be multiplied by the input features in order to make the decision of whether a neuron fires (transmits a signal) or not. In the context of supervised learning and classification, such algirithm could them be used to predict whether a new data points belongs to one class or the other.

Rosenblatt initial perceptron rule and the perceptron algorithm can be summarized by the following steps:

  • initialize the weights at 0 or small random numbers.

  • For each training examples, \(x^{(i)}\):
    • Compute the output value \(\hat{y}\).

    • update the weighs.

the weights \(w\) vector can be fromally written as:

\[w := w_j + \delta w_j\]
Parameters:
  • eta (float,) – Learning rate between (0. and 1.)

  • n_iter (int ,) – number of iteration passes over the training set

  • random_state (int, default is 42) – random number generator seed for random weight initialization.

w_#

Weight after fitting

Type:

Array-like,

errors_#

Number of missclassification (updates ) in each epoch

Type:

list

References

[1]

Rosenblatt F, 1957, The perceptron:A perceiving and Recognizing Automaton,Cornell Aeoronautical Laboratory 1957

[2]

McCulloch W.S and W. Pitts, 1943. A logical calculus of Idea of Immanent in Nervous Activity, Bulleting of Mathematical Biophysics, 5(4): 115-133, 1943.

fit(X, y)[source]#

Fit the training data

Parameters:
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like, shape (M, ) M=m-samples,) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

Returns:

self – returns self for easy method chaining.

Return type:

Perceptron instance

net_input(X)[source]#

Compute the net input

predict(X)[source]#

Predict the class label after unit step

Parameters:
XNdarray ( M x N matrix where M=m-samples, & N=n-features)

Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Returns:
ypred: predicted class label after the unit step (1, or -1)
class watex.base.MajorityVoteClassifier(clfs, weights=None, vote='classlabel')[source]#

Bases: BaseEstimator, ClassifierMixin

A majority vote Ensemble classifier

Combine different classification algorithms associate with individual weights for confidence. The goal is to build a stronger meta-classifier that balance out of the individual classifiers weaknes on a particular datasets. In more precise in mathematical terms, the weighs majority vote can be expressed as follow:

\[\hat{y} = arg \max{i} \sum {j=1}^{m} w_j\chi_A (C_j(x)=1)\]

where \(w_j\) is a weight associated with a base classifier, \(C_j\); \(\hat{y}\) is the predicted class label of the ensemble. \(A\) is the set of the unique class label; \(\chi_A\) is the characteristic function or indicator function which returns 1 if the predicted class of the jth clasifier matches \(i(C_j(x)=1)\). For equal weights, the equation is simplified as follow:

\[\hat{y} = mode {{C_1(x), C_2(x), ... , C_m(x)}}\]
Parameters:
  • clfs ({array_like}, shape (n_classifiers)) – Differents classifier for ensembles

  • vote (str , ['classlabel', 'probability'], default is {'classlabel'}) – If ‘classlabel’ the prediction is based on the argmax of the class label. Otherwise, if ‘probability’, the argmax of the sum of the probabilities is used to predict the class label. Note it is recommended for calibrated classifiers.

  • weights ({array-like}, shape (n_classifiers, ), Optional, default=None) – If a list of int or float, values are provided, the classifier are weighted by importance; it uses the uniform weights if ‘weights’ is None.

classes_#

array of classifiers withencoded classes labels

Type:

array_like, shape (n_classifiers)

classifiers_#

list of fitted classifiers

Type:

list,

Examples

>>> from watex.exlib.sklearn import (
    LogisticRegression,DecisionTreeClassifier ,KNeighborsClassifier,
     Pipeline , cross_val_score , train_test_split , StandardScaler ,
     SimpleImputer )
>>> from watex.datasets import fetch_data
>>> from watex.base import MajorityVoteClassifier
>>> from watex.base import selectfeatures
>>> data = fetch_data('bagoue original').get('data=dfy1')
>>> X0 = data.iloc [:, :-1]; y0 = data ['flow'].values
>>> # exclude the categorical value for demonstration
>>> # binarize the target y
>>> y = np.asarray (list(map (lambda x: 0 if x<=1 else 1, y0)))
>>> X = selectfeatures (X0, include ='number')
>>> X = SimpleImputer().fit_transform (X)
>>> X, Xt , y, yt = train_test_split(X, y)
>>> clf1 = LogisticRegression(penalty ='l2', solver ='lbfgs')
>>> clf2= DecisionTreeClassifier(max_depth =1 )
>>> clf3 = KNeighborsClassifier( p =2 , n_neighbors=1)
>>> pipe1 = Pipeline ([('sc', StandardScaler()),
                       ('clf', clf1)])
>>> pipe3 = Pipeline ([('sc', StandardScaler()),
                       ('clf', clf3)])
  1. -> Test the each classifier results taking individually

>>> clf_labels =['Logit', 'DTC', 'KNN']
>>> # test the results without using the MajorityVoteClassifier
>>> for clf , label in zip ([pipe1, clf2, pipe3], clf_labels):
        scores = cross_val_score(clf, X, y , cv=10 , scoring ='roc_auc')
        print("ROC AUC: %.2f (+/- %.2f) [%s]" %(scores.mean(),
                                                 scores.std(),
                                                 label))
... ROC AUC: 0.91 (+/- 0.05) [Logit]
    ROC AUC: 0.73 (+/- 0.07) [DTC]
    ROC AUC: 0.77 (+/- 0.09) [KNN]
  1. _> Implement the MajorityVoteClassifier

>>> # test the resuls with Majority vote
>>> mv_clf = MajorityVoteClassifier(clfs = [pipe1, clf2, pipe3])
>>> clf_labels += ['Majority voting']
>>> all_clfs = [pipe1, clf2, pipe3, mv_clf]
>>> for clf , label in zip (all_clfs, clf_labels):
        scores = cross_val_score(clf, X, y , cv=10 , scoring ='roc_auc')
        print("ROC AUC: %.2f (+/- %.2f) [%s]" %(scores.mean(),
                                                 scores.std(), label))
... ROC AUC: 0.91 (+/- 0.05) [Logit]
    ROC AUC: 0.73 (+/- 0.07) [DTC]
    ROC AUC: 0.77 (+/- 0.09) [KNN]
    ROC AUC: 0.92 (+/- 0.06) [Majority voting] # give good score & less errors
fit(X, y)[source]#

Fit classifiers

Parameters:
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like, shape (M, ) M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

Returns:

self – returns self for easy method chaining.

Return type:

MajorityVoteClassifier instance

get_params(deep=True)[source]#

Overwrite the get params from _Base class and get classifiers parameters from GridSearch .

property inspect#

Inspect object whether is fitted or not

predict(X)[source]#

Predict the class label of X

Parameters:

X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Returns:

maj_vote – Predicted class label array

Return type:

{array_like}, shape (n_examples, )

predict_proba(X)[source]#

Predict the class probabilities an return average probabilities which is usefull when computing the the receiver operating characteristic area under the curve (ROC AUC ).

Parameters:

X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Returns:

avg_proba – weights average probabilities for each class per example.

Return type:

{array_like }, shape (n_examples, n_classes)

set_score_request(*, sample_weight='$UNCHANGED$')#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class watex.base.Missing(in_percent=False, sample=None, kind=None, drop_columns=None, **kws)[source]#

Bases: Data

Deal with missing values in Data

Most algorithms will not work with missing data. Notable exceptions are the recent boosting libraries such as the XGBoost (watex.documentation.xgboost.__doc__) CatBoost and LightGBM. As with many things in machine learning , there are no hard answaers for how to treat a missing data. Also, missing data could represent different situations. There are three warious way to handle missing data:

* Remove any row with missing data
* Remove any columns with missing data
* Impute missing values
* Create an indicator columns to indicator data was missing
Parameters:
  • in_percent (bool,) – give the statistic of missing data in percentage if ser to True.

  • sample (int, Optional,) – Number of row to visualize or the limit of the number of sample to be able to see the patterns. This is usefull when data is composed of many rows. Skrunked the data to keep some sample for visualization is recommended. None plot all the samples ( or examples) in the data

  • kind (str, Optional) –

    type of visualization. Can be dendrogramm, mbar or bar. corr plot for dendrogram , msno bar, plt and msno correlation visualization respectively:

    • bar plot counts the nonmissing data using pandas

    • mbar use the msno package to count the number

      of nonmissing data.

    • dendrogram`` show the clusterings of where the data is missing.

      leaves that are the same level predict one onother presence (empty of filled). The vertical arms are used to indicate how different cluster are. short arms mean that branch are similar.

    • ``corr` creates a heat map showing if there are correlations

      where the data is missing. In this case, it does look like the locations where missing data are corollated.

    • None is the default vizualisation. It is useful for viewing

      contiguous area of the missing data which would indicate that the missing data is not random. The matrix function includes a sparkline along the right side. Patterns here would also indicate non-random missing data. It is recommended to limit the number of sample to be able to see the patterns.

    Any other value will raise an error

Examples

>>> from watex.base import Missing
>>> data ='data/geodata/main.bagciv.data.csv'
>>> ms= Missing().fit(data)
>>> ms.plot_.fig_size = (12, 4 )
>>> ms.plot ()
drop(data=None, columns=None, inplace=False, axis=1, **kwd)[source]#

Remove missing data

Parameters:
  • data (Dataframe of shape (M, N) from pandas.DataFrame) – Dataframe containing samples M and features N

  • columns (str or list of str) – columns to drop which contain the missing data. Can use the axis equals to ‘1’.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    Determine if rows or columns which contain missing values are removed. * 0, or ‘index’ : Drop rows which contain missing values.

    • 1, or ‘columns’ : Drop columns which contain missing value.

    Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

  • how ({'any', 'all'}, default 'any') –

    Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

    • ’any’: If any NA values are present, drop that row or column.

    • ’all’ : If all values are NA, drop that row or column.

  • thresh (int, optional) – Require that many non-NA values. Cannot be combined with how.

  • subset (column label or sequence of labels, optional) – Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

  • inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.

Returns:

``self`` – returns self for easy method chaining.

Return type:

Missing instance

property get_missing_columns#

return columns with Nan Values

property isnull#

Check the mean values in the data in percentge

plot(figsize=None, **kwd)[source]#

Vizualize patterns in the missing data.

Parameters:
  • data (Dataframe of shape (M, N) from pandas.DataFrame) – Dataframe containing samples M and features N

  • kind (str, Optional) –

    kind of visualization. Can be dendrogramm, mbar or bar plot for dendrogram , msno bar and plt visualization respectively:

    • bar plot counts the nonmissing data using pandas

    • mbar use the msno package to count the number

      of nonmissing data.

    • dendrogram`` show the clusterings of where the data is missing.

      leaves that are the same level predict one onother presence (empty of filled). The vertical arms are used to indicate how different cluster are. short arms mean that branch are similar.

    • ``corr` creates a heat map showing if there are correlations

      where the data is missing. In this case, it does look like the locations where missing data are corollated.

    • None is the default vizualisation. It is useful for viewing

      contiguous area of the missing data which would indicate that the missing data is not random. The matrix function includes a sparkline along the right side. Patterns here would also indicate non-random missing data. It is recommended to limit the number of sample to be able to see the patterns.

    Any other value will raise an error

  • sample (int, Optional) – Number of row to visualize. This is usefull when data is composed of many rows. Skrunked the data to keep some sample for visualization is recommended. None plot all the samples ( or examples) in the data

  • kws (dict) – Additional keywords arguments of msno.matrix plot.

Returns:

``self`` – returns self for easy method chaining.

Return type:

Missing instance

Examples

>>> from watex.base import Missing
>>> data ='data/geodata/main.bagciv.data.csv'
>>> ms= Missing().fit(data)
>>> ms.plot(figsize = (12, 4 ) )
replace(data=None, columns=None, fill_value=None, new_column_name=None, return_non_null=False, **kwd)[source]#

Replace the missing values to consider.

Use the coalease function of pyjanitor. It takes a dataframe and a list of columns to consider. This is a similar to functionality found in Excel and SQL databases. It returns the first non null value of each row.

Parameters:
  • data (Dataframe of shape (M, N) from pandas.DataFrame) – Dataframe containing samples M and features N

  • columns (str or list of str) – columns to replace which contain the missing data. Can use the axis equals to ‘1’.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    Determine if rows or columns which contain missing values are

    removed. * 0, or ‘index’ : Drop rows which contain missing values.

    • 1, or ‘columns’ : Drop columns which contain missing value.

    Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

    returns:

    ``self`` – returns self for easy method chaining.

    rtype:

    Missing instance

property sanity_check#

Ensure that we have deal with all missing values. The following code returns a single boolean if there is any cell that is missing in a DataFrame

class watex.base.SequentialBackwardSelection(estimator=None, k_features=1, scoring='accuracy', test_size=0.25, random_state=42)[source]#

Bases: _Base

Sequential Backward Selection (SBS) is a feature selection algorithm which aims to reduce dimensionality of the initial feature subspace with a minimum decay in the performance of the classifier to improve upon computationan efficiency. In certains cases, SBS can even improve the predictive power of the model if a model suffers from overfitting.

The idea behind the SBS is simple: it sequentially removes features from the full feature subset until the new feature subspace contains the desired number of features. In order to determine which feature is to be removed at each stage, the criterion fonction \(J\) is needed for minimization [1]. Indeed, the criterion calculated from the criteria function can simply be the difference in performance of the classifier before and after the removal of this particular feature. Then, the feature to be remove at each stage can simply be the defined as the feature that maximizes this criterion; or in more simple terms, at each stage, the feature that causes the least performance is eliminated loss after removal. Based on the preceding definition of SBS, the algorithm can be outlibe with a few steps:

  • Initialize the algorithm with \(k=d\), where \(d\) is the

    dimensionality of the full feature space, \(X_d\).

  • Determine the feature \(x^{-}\),that maximizes the criterion:

    \(x^{-}= argmax J(X_k-x)\), where \(x\in X_k\).

  • Remove the feature \(x^{-}\) from the feature set

    \(X_{k+1}= X_k -x^{-}; k=k-1\).

-Terminate if \(k\) equals to the number of desired features;

otherwise go to the step 2. [2]

Parameters:
  • estimator (callable or instanciated object,) – callable or instance object that has a fit method.

  • k_features (int, default=1) – the number of features from where starting the selection. It must be less than the number of feature in the training set, otherwise it does not make sense.

  • scoring (callable or str , default='accuracy') – metric for scoring. availabe metric are ‘precision’, ‘recall’, ‘roc_auc’ or ‘accuracy’. Any other metric with raise an errors.

  • test_size (float or int, default=None) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • random_state (int, RandomState instance or None, default=None) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.

References

[1]

Raschka, S., Mirjalili, V., 2019. Python Machine Learning, 3rd ed. Packt.

[2]

Ferri F., Pudil F., Hatef M., and Kittler J., Comparative study of the techniques for Large-scale feature selection, pages 403-413, 1994.

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:

ndarray of shape (n_features_in_,)

indices_#

Collect the indices of subset of the best validated models

Type:

tuple of dimensionnality X

subsets_#

list of indices_

Type:

list,

scores_#

Collection of the scores of the best model got during the cross-validating

Type:

list,

k_score_#

The score of the desired feature.

Type:

float,

Examples

>>> from watex.exlib.sklearn import KNeighborsClassifier , train_test_split
>>> from watex.datasets import fetch_data
>>> from watex.base import SequentialBackwardSelection
>>> X, y = fetch_data('bagoue analysed') # data already standardized
>>> Xtrain, Xt, ytrain,  yt = train_test_split(X, y)
>>> knn = KNeighborsClassifier(n_neighbors=5)
>>> sbs= SequentialBackwardSelection (knn)
>>> sbs.fit(Xtrain, ytrain )
fit(X, y)[source]#

Fit the training data

Note that SBS splits the datasets into a test and training insite the fit function. \(X\) is still fed to the algorithm. Indeed, SBS will then create a new training subsets for testing (validation) and training , which is why this test set is also called the validation dataset. This approach is necessary to prevent our original test set to becoming part of the training data.

Parameters:
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like, shape (M, ) M=m-samples,) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

Returns:

self – returns self for easy method chaining.

Return type:

SequentialBackwardSelection instance

transform(X)[source]#

Transform the training set

Parameters:

X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Returns:

X – New transformed training set with selected features columns

Return type:

Ndarray ( M x N matrix where M=m-samples, & N=n-features)

watex.base.existfeatures(df, features, error='raise')[source]#

Control whether the features exists or not

Parameters:
  • df – a dataframe for features selections

  • features – list of features to select. Lits of features must be in the dataframe otherwise an error occurs.

  • error – str - raise if the features don’t exist in the dataframe. default is raise and ignore otherwise.

Returns:

bool assert whether the features exists

watex.base.get_params(obj)[source]#

Get object parameters.

Object can be callable or instances

Parameters:

obj – object , can be callable or instance

Returns:

dict of parameters values

Examples:

>>> from sklearn.svm import SVC
>>> from watex.base import get_params
>>> sigmoid= SVC (
    **{
        'C': 512.0,
        'coef0': 0,
        'degree': 1,
        'gamma': 0.001953125,
        'kernel': 'sigmoid',
        'tol': 1.0
        }
    )
>>> pvalues = get_params( sigmoid)
>>> {'decision_function_shape': 'ovr',
     'break_ties': False,
     'kernel': 'sigmoid',
     'degree': 1,
     'gamma': 0.001953125,
     'coef0': 0,
     'tol': 1.0,
     'C': 512.0,
     'nu': 0.0,
     'epsilon': 0.0,
     'shrinking': True,
     'probability': False,
     'cache_size': 200,
     'class_weight': None,
     'verbose': False,
     'max_iter': -1,
     'random_state': None
 }
watex.base.selectfeatures(df, features=None, include=None, exclude=None, coerce=False, **kwd)[source]#

Select features and return new dataframe.

Parameters:
  • df – a dataframe for features selections

  • features – list of features to select. Lits of features must be in the dataframe otherwise an error occurs.

  • include – the type of data to retrieved in the dataframe df. Can be number.

  • exclude – type of the data to exclude in the dataframe df. Can be number i.e. only non-digits data will be keep in the data return.

  • coerce – return the whole dataframe with transforming numeric columns. Be aware that no selection is done and no error is raises instead. default is False

  • kwd – additional keywords arguments from pd.astype function

Ref:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html