watex.analysis.iPCA#

watex.analysis.iPCA(X, n_components=None, *, view=False, n_batches=None, return_X=True, store_in_binary_file=False, filename=None, **ipca_kws)[source]#

Incremental PCA

iPCA allows to split the trainsing set into mini-batches and feed algorithm one mini-batch at a time.

Once problem with the preceeding implementation of PCA is that requires the whole training set to fit in memory in order of the SVD algorithm to run. This is usefull for large training sets, and also applying PCA online(i.e, on the fly as a new instance arrive)

Parameters
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • n_components (int, optional) – Number of dimension to preserve. If`n_components` is ranged between float 0. to 1., it indicated the number of variance ratio to preserve. If None as default value the number of variance to preserve is 95%.

  • n_batches (int, optional) – Number of batches to split the training set.

  • store_in_binary_file (bool, default=False) – Alternatively, we used numpy` memmap` class to manipulate a large array stored in a binary file on disk as if it were entirely in memory. The class load only the data it need in memory when it need its.

  • filename (str,optional) – Default binary filename to store in a binary file in a disk.

  • return_X (bool, default =True ,) – return the train set transformed with most representative varaince ratio.

  • view (bool,default=False,) – Plot the explained varaince as a function of number of dimension.

  • ipca_kws (dict,) – Additional keyword arguments passed to sklearn.decomposition.IncrementalPCA

Returns

The transformed training set or the iPCA container attributes for plotting purposes.

Return type

X (NDArray) or iPCA object,

Examples

>>> from watex.analysis.dimensionality import iPCA
>>> from watex.datasets import fetch_data
>>> X, _=fetch_data('Bagoue analysed data')
>>> Xtransf = iPCA(X,n_components=None,n_batches=100, view=True)