Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
import numpy as np
# Example 1: x is ascending, it works with both methods (we expect the NaN value to be filled with "50")
df = pd.DataFrame(
{
'x': [-79, -79, 0, 61, 2783],
'y': [-200, -50, np.nan, -50, 1000],
}
).set_index(['x'])
print('Example 1')
print(df.head())
df1 = df.interpolate('index')
df2 = df.interpolate('slinear')
print(df1.head())
print(df2.head())
# Example 2: x is descending, it does not work with any method (seems like the point at position 1 is not taken into account)
df = pd.DataFrame(
{
'x': [79, 79, 0, -61, -2783],
'y': [-200, -50, np.nan, -50, 1000],
}
).set_index(['x'])
print('Example 2')
print(df.head())
df1 = df.interpolate('index')
df2 = df.interpolate('slinear')
print(df1.head())
print(df2.head())
# Example 3: same kind of df as Example 2 (descending x values), but with 20 elements. It works with 'index', not with 'slinear'
df = pd.DataFrame(
{
'x': [79, 79, 0, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -2783],
'y': [-200, -50, np.nan, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, 1000]
}
).set_index(['x'])
print('Example 3')
print(df.head())
df1 = df.interpolate('index')
df2 = df.interpolate('slinear')
print(df1.head())
print(df2.head())
# Example 4: removing just the elements at positions -2 and -3 in both x and y. it fails with both methods
df = pd.DataFrame(
{
'x': [79, 79, 0, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -61, -2783],
'y': [-200, -50, np.nan, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, -50, 1000]
}
).set_index(['x'])
print('Example 4')
print(df.head())
df1 = df.interpolate('index')
df2 = df.interpolate('slinear')
print(df1.head())
print(df2.head())
Output:
Example 1
y
x
-79 -200.0
-79 -50.0
0 NaN
61 -50.0
2783 1000.0
y
x
-79 -200.0
-79 -50.0
0 -50.0
61 -50.0
2783 1000.0
y
x
-79 -200.0
-79 -50.0
0 -50.0
61 -50.0
2783 1000.0
Example 2
y
x
79 -200.0
79 -50.0
0 NaN
-61 -50.0
-2783 1000.0
y
x
79 -200.000000
79 -50.000000
0 -115.357143
-61 -50.000000
-2783 1000.000000
y
x
79 -200.000000
79 -50.000000
0 -115.357143
-61 -50.000000
-2783 1000.000000
Example 3
y
x
79 -200.0
79 -50.0
0 NaN
-61 -50.0
-61 -50.0
y
x
79 -200.0
79 -50.0
0 -50.0
-61 -50.0
-61 -50.0
y
x
79 -200.000000
79 -50.000000
0 -115.357143
-61 -50.000000
-61 -50.000000
Example 4
y
x
79 -200.0
79 -50.0
0 NaN
-61 -50.0
-61 -50.0
y
x
79 -200.000000
79 -50.000000
0 -115.357143
-61 -50.000000
-61 -50.000000
y
x
79 -200.000000
79 -50.000000
0 -115.357143
-61 -50.000000
-61 -50.000000
Problem description
I am trying to use pandas to interpolate "curves" that can have "steps" (eg. 2 consecutive points with same x-coordinate, but different y-coordinate, or the other way around). There can be an arbitrary number of NaN values between 2 points.
I think this is quite a common use case, and I think i would be cool to have pandas able to perform this kind of "point-to-point" interpolation.
I have been using either "index" or "slinear" methods to do so. While it seems to work in most cases, there seem to be inconsistencies around:
- the fact that x-axis is ordered in ascending (it works - see Example 1 -) or descending order (it fails - see Example 2 -, more precisely it seems to not consider duplicate values of y)
- In some cases, for descendingly sorted x-axis 'slinear' vs 'index' do not yield the same result (it works with 'index' while failing with 'slinear' - see Example 3/4 -) but I really would not figure out what is the root cause, appart length of dataframe. weird..
Expected Output
I would like all NA values to be linearly interpolated locally, based on the 'previous' and 'next' points (Side note, for extrapolation, 'nearest'-like approach seems to be the most sensible outcome).
Output of pd.show_versions()
commit : f00ed8f
python : 3.7.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : None.None
pandas : 1.3.0
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None