.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "glr_examples/applications/plot_data_exploratory_quick_view.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_glr_examples_applications_plot_data_exploratory_quick_view.py: ===================================================== Data exploratory: Quick view ===================================================== Real-world examples for data exploratory, visualization, ... .. GENERATED FROM PYTHON SOURCE LINES 8-11 .. code-block:: Python # Author: L.Kouadio # Licence: BSD-3-clause .. GENERATED FROM PYTHON SOURCE LINES 12-13 Import required modules .. GENERATED FROM PYTHON SOURCE LINES 13-19 .. code-block:: Python import matplotlib.pyplot as plt from watex.view import ExPlot, QuickPlot, TPlot from watex.datasets import fetch_data , load_bagoue , load_edis from watex.transformers import StratifiedWithCategoryAdder .. GENERATED FROM PYTHON SOURCE LINES 20-30 Data Exploratory with :class:`~watex.view.ExPlot` --------------------------------------------------- Explore data for analysis purpose `ExPlot` is a shadow class. Exploring data is needed to create a model since it gives a feel for the data and is also at great excuse to meet and discuss issues with business units that control the data. `ExPlot` methods i.e. return an instanced object that inherits from :class:`~watex.property.Baseplots` ABC (Abstract Base Class) for visualization It gives some data exploration tricks. Here are a few examples for analysis and visualization .. GENERATED FROM PYTHON SOURCE LINES 32-34 * Use parallel coordinates in multivariates for clustering visualization (Need yelowbrick to be installed if 'pkg' argument is set to 'yb') .. GENERATED FROM PYTHON SOURCE LINES 34-37 .. code-block:: Python data =fetch_data('original data').get('data=dfy1') p = ExPlot (tname ='flow').fit(data) p.plotparallelcoords(pkg='pd') .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_001.png :alt: plot data exploratory quick view :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none <'ExPlot':xname=None, yname=None , tname='flow'> .. GENERATED FROM PYTHON SOURCE LINES 38-40 * Plot each sample on a circle or square, with features on the circumference to visualize separately between targets. .. GENERATED FROM PYTHON SOURCE LINES 40-44 .. code-block:: Python data2 = fetch_data('bagoue original').get('data=dfy2') p = ExPlot(tname ='flow').fit(data2) p.plotradviz(classes= None, pkg='pd' ) .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_002.png :alt: plot data exploratory quick view :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none <'ExPlot':xname=None, yname=None , tname='flow'> .. GENERATED FROM PYTHON SOURCE LINES 45-46 * Create pairwise comparisons between features. .. GENERATED FROM PYTHON SOURCE LINES 46-55 .. code-block:: Python # Plots shows a ['pearson'|'spearman'|'covariance'] correlation. data = fetch_data ('bagoue original').get('data=dfy1') p= ExPlot(tname='flow').fit(data) p.plotpairwisecomparison(fmt='.2f', corr='spearman', annot=True, cmap='RdBu_r', vmin=-1, vmax=1 ) .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_003.png :alt: plot data exploratory quick view :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/watex/checkouts/0.3.2/watex/view/plot.py:2662: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning. <'ExPlot':xname=None, yname=None , tname='flow'> .. GENERATED FROM PYTHON SOURCE LINES 56-57 Create a pair grid. .. GENERATED FROM PYTHON SOURCE LINES 57-64 .. code-block:: Python # Is a matrix of columns and kernel density estimations. # To colorize by columns from a data frame, use the 'hue' parameter. data = fetch_data ('bagoue original').get('data=dfy1') p= ExPlot(tname='flow').fit(data) p.plotpairgrid (vars = ['magnitude', 'power', 'ohmS'] ) .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_004.png :alt: plot data exploratory quick view :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none <'ExPlot':xname=None, yname=None , tname='flow'> .. GENERATED FROM PYTHON SOURCE LINES 65-72 Features analysis with :class:`~watex.view.QuickPlot` --------------------------------------------------------- Special class dealing with analysis modules for quick diagrams, histograms, and bar visualization. Originally, it was designed for the flow rate prediction, however, it still works with any other dataset by following the details of the parameters. Here are some quick features analysis examples. .. GENERATED FROM PYTHON SOURCE LINES 74-75 * Create a plot of naive visualization .. GENERATED FROM PYTHON SOURCE LINES 75-88 .. code-block:: Python df = load_bagoue ().frame stratifiedNumObj= StratifiedWithCategoryAdder('flow') strat_train_set , *_= stratifiedNumObj.fit_transform(X=df) pd_kws ={'alpha': 0.4, 'label': 'flow m3/h', 'c':'flow', 'cmap':plt.get_cmap('jet'), 'colorbar':True} qkObj=QuickPlot(fs=25.) qkObj.fit(strat_train_set) qkObj.naiveviz( x= 'east', y='north', **pd_kws) .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_005.png :alt: plot data exploratory quick view :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_005.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none QuickPlot(savefig= None, fig_num= 1, fig_size= (12, 8), ... , classes= None, tname= None, mapflow= False) .. GENERATED FROM PYTHON SOURCE LINES 89-94 * Provide the names of the features at least 04 and discuss their distribution. This method maps a dataset onto multiple axes arrayed in a grid of rows and columns that correspond to levels of features in the dataset. The plots it produces are often called “lattice”, “trellis”, or 'small multiple graphics. .. GENERATED FROM PYTHON SOURCE LINES 94-108 .. code-block:: Python data = load_bagoue ().frame qkObj = QuickPlot( leg_kws={'loc':'upper right'}, fig_title = '`sfi` vs`ohmS|`geol`', ) qkObj.tname='flow' # target the DC-flow rate prediction dataset qkObj.mapflow=True # to hold category FR0, FR1 etc.. qkObj.fit(data) sns_pkws={'aspect':2 , "height": 2, } map_kws={'edgecolor':"w"} qkObj.discussingfeatures(features =['ohmS', 'sfi','geol', 'flow'], map_kws=map_kws, **sns_pkws ) .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_006.png :alt: geol = VOLCANO-SEDIM. SCHISTS, geol = GRANITES, geol = GEOSYN. GRANITES, geol = GNEISS-MIGMATITES :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_006.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none QuickPlot(savefig= None, fig_num= 1, fig_size= (12, 8), ... , classes= None, tname= flow, mapflow= True) .. GENERATED FROM PYTHON SOURCE LINES 109-111 * Joint method allows the visualization correlation of two features. Draw a plot of two features with bivariate and univariate graphs. .. GENERATED FROM PYTHON SOURCE LINES 111-127 .. code-block:: Python data = load_bagoue ().frame qkObj = QuickPlot( lc='b', sns_style ='darkgrid', fig_title='Quantitative features correlation' ).fit(data) sns_pkws={ 'kind':'reg' , #'kde', 'hex' # "hue": 'flow', } joinpl_kws={"color": "r", 'zorder':0, 'levels':6} plmarg_kws={'color':"r", 'height':-.15, 'clip_on':False} qkObj.joint2features(features=['ohmS', 'lwi'], join_kws=joinpl_kws, marginals_kws=plmarg_kws, **sns_pkws, ) .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_007.png :alt: plot data exploratory quick view :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_007.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none QuickPlot(savefig= None, fig_num= 1, fig_size= (12, 8), ... , classes= None, tname= None, mapflow= False) .. GENERATED FROM PYTHON SOURCE LINES 128-136 Tensors recovery with :class:`~watex.view.TPlot` --------------------------------------------------------- Tensor plot from EM processing data `TPlot` is a Tensor (Impedances, resistivity, and phases ) plot class. Explore SEG ( Society of Exploration Geophysicist ) class data. Plot recovery tensors. `TPlot` method returns an instanced object that inherits from :class:`watex.property.Baseplots` ABC (Abstract Base Class) for visualization. Here are a few demonstration examples. .. GENERATED FROM PYTHON SOURCE LINES 138-140 * Plot multiple sites/stations with signal recovery. takes the 03 samples of EDIs .. GENERATED FROM PYTHON SOURCE LINES 140-143 .. code-block:: Python edi_data = load_edis (return_data= True, samples =3 ) TPlot(fig_size =(5, 3), font_size=7., sns_style='ticks').fit(edi_data).plot_multi_recovery ( sites =['S00'], colors =['o', 'ok--']) .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_008.png :alt: Recovered tensor $|Z_{xy}|$ :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_008.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 144-145 * Plot two-dimensional recovery tensor .. GENERATED FROM PYTHON SOURCE LINES 145-159 .. code-block:: Python # get some 12 samples of EDI for the demo edi_data = load_edis (return_data =True, samples =12 ) # customize the plot by adding plot_kws plot_kws = dict( ylabel = '$Log_{10}Frequency [Hz]$', xlabel = '$Distance(m)$', cb_label = '$Log_{10}Rhoa[\Omega.m$]', fig_size =(7, 4), font_size =7. ) t= TPlot(**plot_kws ).fit(edi_data) # plot recovery2d using the log10 resistivity t.plot_tensor2d (to_log10=True) .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_009.png :alt: plot data exploratory quick view :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_009.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 160-161 * Plot two-dimensional filtered tensors using the default trimming moving-average (AMA) filter .. GENERATED FROM PYTHON SOURCE LINES 161-175 .. code-block:: Python # take the 12 samples of EDI and plot the corrected tensors edi_data = load_edis (return_data =True, samples =12 ) # customize plot by adding plot_kws plot_kws = dict( ylabel = '$Log_{10}Frequency [Hz]$', xlabel = '$Distance(m)$', cb_label = '$Log_{10}Rhoa[\Omega.m$]', fig_size =(7, 4), font_size =7. ) t= TPlot(**plot_kws ).fit(edi_data) # plot filtered tensor using the log10 resistivity t.plot_ctensor2d (to_log10=True) .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_010.png :alt: plot data exploratory quick view :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_010.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 176-181 Model evaluation with :class:`~watex.view.EvalPlot` --------------------------------------------------------- Metric and dimensionality Evaluation Plots `EvalPlot` Inherited from :class:`BasePlot`. Dimensional reduction and metric plots. The class works only with numerical features. .. GENERATED FROM PYTHON SOURCE LINES 183-184 * Plot ROC for RandomForest classifier .. GENERATED FROM PYTHON SOURCE LINES 184-201 .. code-block:: Python from watex.exlib.sklearn import RandomForestClassifier from watex.datasets.dload import load_bagoue from watex.utils import cattarget from watex.view.mlplot import EvalPlot X , y = load_bagoue(as_frame =True ) rdf_clf = RandomForestClassifier(random_state= 42) # our estimator b= EvalPlot(scale = True , encode_labels=True) b.fit_transform(X, y) # binarize the label b.y ybin = cattarget(b.y, labels= 2 ) # can also use labels =[0, 1] b.y = ybin b.font_size=7. b.lc ='r' b.lw =7. b.sns_style='ticks' b.plotROC(rdf_clf , label =1, method ="predict_proba") # class=1 .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_011.png :alt: plot data exploratory quick view :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_011.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none EvalPlot(tname= None, objective= None, scale= True, ... , sns_height= 4.0, sns_aspect= 0.7, verbose= 0) .. GENERATED FROM PYTHON SOURCE LINES 202-204 * Plot confusion matrix customize plot .. GENERATED FROM PYTHON SOURCE LINES 204-244 .. code-block:: Python matshow_kwargs ={ 'aspect': 'auto', # 'auto'equal 'interpolation': None, 'cmap':'cool'} plot_kws ={'lw':3, 'lc':(.9, 0, .8), 'font_size':15., 'cb_format':None, 'xlabel': 'Predicted classes', 'ylabel': 'Actual classes', 'font_weight':None, 'tp_labelbottom':False, 'tp_labeltop':True, 'tp_bottom': False } # replace the integer identifier with a litteral string b.litteral_classes = ['FR0', 'FR1']# 'FR2', 'FR3'] b.plotConfusionMatrix(clf=rdf_clf, matshow_kws = matshow_kwargs, **plot_kws) .. image-sg:: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_012.png :alt: plot data exploratory quick view :srcset: /glr_examples/applications/images/sphx_glr_plot_data_exploratory_quick_view_012.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none EvalPlot(tname= None, objective= None, scale= True, ... , sns_height= 4.0, sns_aspect= 0.7, verbose= 0) .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 13.111 seconds) .. _sphx_glr_download_glr_examples_applications_plot_data_exploratory_quick_view.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/watex/watex/0.3.X?urlpath=lab/tree/notebooks/glr_examples/applications/plot_data_exploratory_quick_view.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_data_exploratory_quick_view.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_data_exploratory_quick_view.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_