Skip to content

BUG: interpolate method with 'index' and 'slinear' methods give inconsistent results when there are duplicates in x-axis #42585

Open
@scd75

Description

@scd75
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np

# Example 1: x is ascending, it works with both methods (we expect the NaN value to be filled with "50")
df = pd.DataFrame(
    {
        'x': [-79, -79, 0, 61, 2783],
        'y': [-200, -50, np.nan, -50, 1000],
    }
).set_index(['x'])

print('Example 1')
print(df.head())

df1 = df.interpolate('index')
df2 = df.interpolate('slinear')

print(df1.head())
print(df2.head())


# Example 2: x is descending, it does not work with any method (seems like the point at position 1 is not taken into account)
df = pd.DataFrame(
    {
        'x': [79, 79, 0, -61, -2783],
        'y': [-200, -50, np.nan, -50, 1000],
    }
).set_index(['x'])

print('Example 2')
print(df.head())

df1 = df.interpolate('index')
df2 = df.interpolate('slinear')

print(df1.head())
print(df2.head())


# Example 3: same kind of df as Example 2 (descending x values), but with 20 elements. It works with 'index', not with 'slinear'
df = pd.DataFrame(
    {
        'x': [79, 79, 0, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -2783],
        'y': [-200, -50, np.nan, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, 1000]
    }
).set_index(['x'])

print('Example 3')
print(df.head())

df1 = df.interpolate('index')
df2 = df.interpolate('slinear')

print(df1.head())
print(df2.head())


# Example 4: removing just the elements at positions -2 and -3 in both x and y. it fails with both methods
df = pd.DataFrame(
    {
        'x': [79, 79, 0, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -2783],
        'y': [-200, -50, np.nan, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, 1000]
    }
).set_index(['x'])

print('Example 4')
print(df.head())

df1 = df.interpolate('index')
df2 = df.interpolate('slinear')

print(df1.head())
print(df2.head())

Output:

Example 1
            y
x
-79    -200.0
-79     -50.0
 0        NaN
 61     -50.0
 2783  1000.0
            y
x
-79    -200.0
-79     -50.0
 0      -50.0
 61     -50.0
 2783  1000.0
            y
x
-79    -200.0
-79     -50.0
 0      -50.0
 61     -50.0
 2783  1000.0
Example 2
            y
x
 79    -200.0
 79     -50.0
 0        NaN
-61     -50.0
-2783  1000.0
                 y
x
 79    -200.000000
 79     -50.000000
 0     -115.357143
-61     -50.000000
-2783  1000.000000
                 y
x
 79    -200.000000
 79     -50.000000
 0     -115.357143
-61     -50.000000
-2783  1000.000000
Example 3
         y
x
 79 -200.0
 79  -50.0
 0     NaN
-61  -50.0
-61  -50.0
         y
x
 79 -200.0
 79  -50.0
 0   -50.0
-61  -50.0
-61  -50.0
              y
x
 79 -200.000000
 79  -50.000000
 0  -115.357143
-61  -50.000000
-61  -50.000000
Example 4
         y
x
 79 -200.0
 79  -50.0
 0     NaN
-61  -50.0
-61  -50.0
              y
x
 79 -200.000000
 79  -50.000000
 0  -115.357143
-61  -50.000000
-61  -50.000000
              y
x
 79 -200.000000
 79  -50.000000
 0  -115.357143
-61  -50.000000
-61  -50.000000

Problem description

I am trying to use pandas to interpolate "curves" that can have "steps" (eg. 2 consecutive points with same x-coordinate, but different y-coordinate, or the other way around). There can be an arbitrary number of NaN values between 2 points.
I think this is quite a common use case, and I think i would be cool to have pandas able to perform this kind of "point-to-point" interpolation.
I have been using either "index" or "slinear" methods to do so. While it seems to work in most cases, there seem to be inconsistencies around:

  • the fact that x-axis is ordered in ascending (it works - see Example 1 -) or descending order (it fails - see Example 2 -, more precisely it seems to not consider duplicate values of y)
  • In some cases, for descendingly sorted x-axis 'slinear' vs 'index' do not yield the same result (it works with 'index' while failing with 'slinear' - see Example 3/4 -) but I really would not figure out what is the root cause, appart length of dataframe. weird..

Expected Output

I would like all NA values to be linearly interpolated locally, based on the 'previous' and 'next' points (Side note, for extrapolation, 'nearest'-like approach seems to be the most sensible outcome).

Output of pd.show_versions()

commit : f00ed8f
python : 3.7.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : None.None

pandas : 1.3.0
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions