watex.base.MajorityVoteClassifier#
- class watex.base.MajorityVoteClassifier(clfs, weights=None, vote='classlabel')[source]#
A majority vote Ensemble classifier
Combine different classification algorithms associate with individual weights for confidence. The goal is to build a stronger meta-classifier that balance out of the individual classifiers weaknes on a particular datasets. In more precise in mathematical terms, the weighs majority vote can be expressed as follow:
\[\hat{y} = arg \max{i} \sum {j=1}^{m} w_j\chi_A (C_j(x)=1)\]where \(w_j\) is a weight associated with a base classifier, \(C_j\); \(\hat{y}\) is the predicted class label of the ensemble. \(A\) is the set of the unique class label; \(\chi_A\) is the characteristic function or indicator function which returns 1 if the predicted class of the jth clasifier matches \(i(C_j(x)=1)\). For equal weights, the equation is simplified as follow:
\[\hat{y} = mode {{C_1(x), C_2(x), ... , C_m(x)}}\]- Parameters:
clfs ({array_like}, shape (n_classifiers)) – Differents classifier for ensembles
vote (str , ['classlabel', 'probability'], default is {'classlabel'}) – If ‘classlabel’ the prediction is based on the argmax of the class label. Otherwise, if ‘probability’, the argmax of the sum of the probabilities is used to predict the class label. Note it is recommended for calibrated classifiers.
weights ({array-like}, shape (n_classifiers, ), Optional, default=None) – If a list of int or float, values are provided, the classifier are weighted by importance; it uses the uniform weights if ‘weights’ is
None.
- classes_#
array of classifiers withencoded classes labels
- Type:
array_like, shape (n_classifiers)
- classifiers_#
list of fitted classifiers
- Type:
list,
Examples
>>> from watex.exlib.sklearn import ( LogisticRegression,DecisionTreeClassifier ,KNeighborsClassifier, Pipeline , cross_val_score , train_test_split , StandardScaler , SimpleImputer ) >>> from watex.datasets import fetch_data >>> from watex.base import MajorityVoteClassifier >>> from watex.base import selectfeatures >>> data = fetch_data('bagoue original').get('data=dfy1') >>> X0 = data.iloc [:, :-1]; y0 = data ['flow'].values >>> # exclude the categorical value for demonstration >>> # binarize the target y >>> y = np.asarray (list(map (lambda x: 0 if x<=1 else 1, y0))) >>> X = selectfeatures (X0, include ='number') >>> X = SimpleImputer().fit_transform (X) >>> X, Xt , y, yt = train_test_split(X, y) >>> clf1 = LogisticRegression(penalty ='l2', solver ='lbfgs') >>> clf2= DecisionTreeClassifier(max_depth =1 ) >>> clf3 = KNeighborsClassifier( p =2 , n_neighbors=1) >>> pipe1 = Pipeline ([('sc', StandardScaler()), ('clf', clf1)]) >>> pipe3 = Pipeline ([('sc', StandardScaler()), ('clf', clf3)])
-> Test the each classifier results taking individually
>>> clf_labels =['Logit', 'DTC', 'KNN'] >>> # test the results without using the MajorityVoteClassifier >>> for clf , label in zip ([pipe1, clf2, pipe3], clf_labels): scores = cross_val_score(clf, X, y , cv=10 , scoring ='roc_auc') print("ROC AUC: %.2f (+/- %.2f) [%s]" %(scores.mean(), scores.std(), label)) ... ROC AUC: 0.91 (+/- 0.05) [Logit] ROC AUC: 0.73 (+/- 0.07) [DTC] ROC AUC: 0.77 (+/- 0.09) [KNN]
_> Implement the MajorityVoteClassifier
>>> # test the resuls with Majority vote >>> mv_clf = MajorityVoteClassifier(clfs = [pipe1, clf2, pipe3]) >>> clf_labels += ['Majority voting'] >>> all_clfs = [pipe1, clf2, pipe3, mv_clf] >>> for clf , label in zip (all_clfs, clf_labels): scores = cross_val_score(clf, X, y , cv=10 , scoring ='roc_auc') print("ROC AUC: %.2f (+/- %.2f) [%s]" %(scores.mean(), scores.std(), label)) ... ROC AUC: 0.91 (+/- 0.05) [Logit] ROC AUC: 0.73 (+/- 0.07) [DTC] ROC AUC: 0.77 (+/- 0.09) [KNN] ROC AUC: 0.92 (+/- 0.06) [Majority voting] # give good score & less errors