watex.view.plot_reg_scoring#

watex.view.plot_reg_scoring(reg, X, y, test_size=None, random_state=42, scoring='mse', return_errors=False, **baseplot_kws)[source]#

Plot regressor learning curves using root-mean squared error scorings.

Use the hold-out cross-validation technique for score evaluation [1].

Parameters
  • reg (callable, always as a function) – A regression estimator; Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator. The estimated model is stored in public and private attributes on the estimator instance, facilitating decoding through prediction and transformation methods. The core functionality of some estimators may also be available as a function.

  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • scoring (str, ['mse'|'rmse'], default ='mse') – kind of error to visualize on the regression learning curve.

  • test_size (float or int, default=None) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • random_state (int, RandomState instance or None, default=None) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls..

  • return_errors (bool, default='False') – returns training eror and validation errors.

  • baseplot_kws (dict,) – All all the keywords arguments passed to the peroperty watex.property.BasePlot class.

Returns

(train_errors, val_errors) – training score and validation scores if return_errors is set to True, otherwise returns nothing

Return type

Tuple,

Examples

>>> from watex.datasets import fetch_data
>>> from watex.view.mlplot import plot_reg_scoring
>>> # Note that for the demo, we import SVC rather than LinearSVR since the
>>> # problem of Bagoue dataset is a classification rather than regression.
>>> # if use regression instead, a convergence problem will occurs.
>>> from watex.exlib.sklearn import SVC
>>> X, y = fetch_data('bagoue analysed')# got the preprocessed and imputed data
>>> svm =SVC()
>>> t_errors, v_errors =plot_reg_scoring(svm, X, y, return_errors=True)

Notes

The hold-out technique is the classic and most popular approach for estimating the generalization performance of the machine learning. The dataset is splitted into training and test sets. The former is used for the model training whereas the latter is used for model performance evaluation. However in typical machine learning we are also interessed in tuning and comparing different parameter setting for futher improve the performance for the name refering to the given classification or regression problem for which we want the optimal values of tuning the hyperparameters. Thus, reusing the same datset over and over again during the model selection is not recommended since it will become a part of the training data and then the model will be more likely to overfit. From this issue, the hold-out cross validation is not a good learning practice. A better way to use the hold-out method is to separate the data into three parts such as the traing set, the the validation set and the test dataset. See more in [2].

References

1

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., et al. (2011) Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12, 2825–2830.

2

Raschka, S. & Mirjalili, V. (2019) Python Machine Learning. (J. Malysiak, S. Jain, J. Lovell, C. Nelson, S. D’silva & R. Atitkar, Eds.), 3rd ed., Packt.