watex.utils.interpolate1d#

watex.utils.interpolate1d(arr, kind='slinear', method=None, order=None, fill_value='extrapolate', limit=None, **kws)[source]#

Interpolate array containing invalid values NaN

Usefull function to interpolate the missing frequency values in the tensor components.

Parameters:
  • arr (array_like) – Array to interpolate containg invalid values. The invalid value here is NaN.

  • kind (str or int, optional) – Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of linear, nearest, nearest-up, zero, slinear,``quadratic``, cubic, previous, or next. zero, slinear, quadratic``and ``cubic refer to a spline interpolation of zeroth, first, second or third order; previous and next simply return the previous or next value of the point; nearest-up and nearest differ when interpolating half-integers (e.g. 0.5, 1.5) in that nearest-up rounds up and nearest rounds down. If method param is set to pd which refers to pd.interpolate method , kind can be set to polynomial or pad interpolation. Note that the polynomial requires you to specify an order while pad requires to specify the limit. Default is slinear.

  • method (str, optional, default='mean') – Method of interpolation. Can be base for scipy.interpolate.interp1d mean or bff for scaling methods and pd``for pandas interpolation methods. Note that the first method is fast and efficient when the number of NaN in the array if relatively few. It is less accurate to use the `base` interpolation when the data is composed of many missing values. Alternatively, the scaled method(the  second one) is proposed to be the alternative way more efficient. Indeed, when ``mean argument is set, function replaces the NaN values by the nonzeros in the raw array and then uses the mean to fit the data. The result of fitting creates a smooth curve where the index of each NaN in the raw array is replaced by its corresponding values in the fit results. The same approach is used for bff method. Conversely, rather than averaging the nonzeros values, it uses the backward and forward strategy to fill the NaN before scaling. mean and bff are more efficient when the data are composed of lot of missing values. When the interpolation method is set to pd, function uses the pandas interpolation but ended the interpolation with forward/backward NaN filling since the interpolation with pandas does not deal with all NaN at the begining or at the end of the array. Default is base.

  • fill_value (array-like or (array-like, array_like) or extrapolate, optional) – If a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value. Using a two-element tuple or ndarray requires bounds_error=False. Default is extrapolate.

  • kws (dict) – Additional keyword arguments from spi.interp1d.

Return type:

array like - New interpoolated array. NaN values are interpolated.

Notes

When interpolated thoughout the complete frequencies i.e all the frequency values using the base method, the missing data in arr can be out of the arr range. So, for consistency and keep all values into the range of frequency, the better idea is to set the param fill_value in kws argument of spi.interp1d to extrapolate. This will avoid an error to raise when the value to interpolated is extra-bound of arr.

References

https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html https://www.askpython.com/python/examples/interpolation-to-fill-missing-entries

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from watex.utils.exmath  import interpolate1d,
>>> z = np.random.randn(17) *10 # assume 17 freq for 17 values of tensor Z
>>> z [[7, 10, 16]] =np.nan # replace some indexes by NaN values
>>> zit = interpolate1d (z, kind ='linear')
>>> z
... array([ -1.97732415, -16.5883156 ,   8.44484348,   0.24032979,
          8.30863276,   4.76437029, -15.45780568,          nan,
         -4.11301794, -10.94003412,          nan,   9.22228383,
        -15.40298253,  -7.24575491,  -7.15149205, -20.9592011 ,
                 nan]),
>>> zn
...array([ -1.97732415, -16.5883156 ,   8.44484348,   0.24032979,
         8.30863276,   4.76437029, -15.45780568,  -4.11301794,
       -10.94003412,   9.22228383, -15.40298253,  -7.24575491,
        -7.15149205, -20.9592011 , -34.76691014, -48.57461918,
       -62.38232823])
>>> zmean = interpolate1d (z,  method ='mean')
>>> zbff = interpolate1d (z, method ='bff')
>>> zpd = interpolate1d (z,  method ='pd')
>>> plt.plot( np.arange (len(z)),  zit, 'v--',
          np.arange (len(z)), zmean, 'ok-',
          np.arange (len(z)), zbff, '^g:',
          np.arange (len(z)), zpd,'<b:',
          np.arange (len(z)), z,'o',
          )
>>> plt.legend(['interp1d', 'mean strategy', 'bff strategy',
                'pandas strategy', 'data'], loc='best')