watex.utils.interpolate1d#
- watex.utils.interpolate1d(arr, kind='slinear', method=None, order=None, fill_value='extrapolate', limit=None, **kws)[source]#
Interpolate array containing invalid values NaN
Usefull function to interpolate the missing frequency values in the tensor components.
- Parameters
arr (array_like) – Array to interpolate containg invalid values. The invalid value here is NaN.
kind (str or int, optional) – Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of
linear,nearest,nearest-up,zero,slinear,``quadratic``,cubic,previous, ornext.zero,slinear,quadratic``and ``cubicrefer to a spline interpolation of zeroth, first, second or third order;previousandnextsimply return the previous or next value of the point;nearest-upandnearestdiffer when interpolating half-integers (e.g. 0.5, 1.5) in thatnearest-uprounds up andnearestrounds down. If method param is set topdwhich refers to pd.interpolate method , kind can be set topolynomialorpadinterpolation. Note that the polynomial requires you to specify an order whilepadrequires to specify the limit. Default isslinear.method (str, optional, default='mean') – Method of interpolation. Can be
basefor scipy.interpolate.interp1dmeanorbfffor scaling methods andpd``for pandas interpolation methods. Note that the first method is fast and efficient when the number of NaN in the array if relatively few. It is less accurate to use the `base` interpolation when the data is composed of many missing values. Alternatively, the scaled method(the second one) is proposed to be the alternative way more efficient. Indeed, when ``meanargument is set, function replaces the NaN values by the nonzeros in the raw array and then uses the mean to fit the data. The result of fitting creates a smooth curve where the index of each NaN in the raw array is replaced by its corresponding values in the fit results. The same approach is used forbffmethod. Conversely, rather than averaging the nonzeros values, it uses the backward and forward strategy to fill the NaN before scaling.meanandbffare more efficient when the data are composed of lot of missing values. When the interpolation method is set to pd, function uses the pandas interpolation but ended the interpolation with forward/backward NaN filling since the interpolation with pandas does not deal with all NaN at the begining or at the end of the array. Default isbase.fill_value (array-like or (array-like, array_like) or
extrapolate, optional) – If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value. Using a two-element tuple or ndarray requires bounds_error=False. Default isextrapolate.kws (dict) – Additional keyword arguments from
spi.interp1d.
- Return type
array like - New interpoolated array. NaN values are interpolated.
Notes
When interpolated thoughout the complete frequencies i.e all the frequency values using the
basemethod, the missing data in arr can be out of the arr range. So, for consistency and keep all values into the range of frequency, the better idea is to set the param fill_value in kws argument ofspi.interp1dto extrapolate. This will avoid an error to raise when the value to interpolated is extra-bound of arr.References
https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html https://www.askpython.com/python/examples/interpolation-to-fill-missing-entries
Examples
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from watex.utils.exmath import interpolate1d, >>> z = np.random.randn(17) *10 # assume 17 freq for 17 values of tensor Z >>> z [[7, 10, 16]] =np.nan # replace some indexes by NaN values >>> zit = interpolate1d (z, kind ='linear') >>> z ... array([ -1.97732415, -16.5883156 , 8.44484348, 0.24032979, 8.30863276, 4.76437029, -15.45780568, nan, -4.11301794, -10.94003412, nan, 9.22228383, -15.40298253, -7.24575491, -7.15149205, -20.9592011 , nan]), >>> zn ...array([ -1.97732415, -16.5883156 , 8.44484348, 0.24032979, 8.30863276, 4.76437029, -15.45780568, -4.11301794, -10.94003412, 9.22228383, -15.40298253, -7.24575491, -7.15149205, -20.9592011 , -34.76691014, -48.57461918, -62.38232823]) >>> zmean = interpolate1d (z, method ='mean') >>> zbff = interpolate1d (z, method ='bff') >>> zpd = interpolate1d (z, method ='pd') >>> plt.plot( np.arange (len(z)), zit, 'v--', np.arange (len(z)), zmean, 'ok-', np.arange (len(z)), zbff, '^g:', np.arange (len(z)), zpd,'<b:', np.arange (len(z)), z,'o', ) >>> plt.legend(['interp1d', 'mean strategy', 'bff strategy', 'pandas strategy', 'data'], loc='best')