Description
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
# length-10 index
idx = pd.date_range('2017-01-02', '2017-01-13', freq='B')
def create_series(exclude):
subset = pd.Index([idx[i] for i in range(len(idx)) if i not in exclude])
return pd.Series(np.arange(len(subset)), index=subset)
TESTS = [
[3, 4], # delete Thursday and Friday of first week - PASS
[4, 5], # delete Friday first week and Monday second week - PASS
[5, 6], # delete Monday and Tuesday of second week - PASS
[6], # delete Tuesday of second week - PASS
[7], # delete Wednesday of second week - PASS
[6, 7], # delete Tuesday and Wednesday of second week - FAIL
]
for exclude in TESTS:
print('TEST: {}'.format(exclude))
s = create_series(exclude)
print('IN:\n{}'.format(s))
t = s.resample('D').ffill() # always works
u = s.resample('B').last() # always works
v = s.resample('B').ffill() # fails on [6, 7]
print('OUT:\n{}'.format(v))
Problem description
s.resample('B').ffill() raises an exception, "ValueError: Length mismatch: Expected axis has 8 elements, new values have 10 elements". This occurs when upsampling from an index containing a multi-day, mid-week gap, but not at many other positions in the week; I am not sure exactly what predicts whether this problem occurs. Other resampling methods, e.g. last(), work as expected. Furthermore, forward filling at a daily ('D') frequency works. I cannot see any reason why this particular case should not be supported, so I believe it is a bug.
Expected Output
I expect the last statement not to raise an exception, and instead to produce the same output as s.resample('D').ffill().resample('B').first(), i.e. I expect:
> s = create_series([6, 7)]
> s
2017-01-02 0
2017-01-03 1
2017-01-04 2
2017-01-05 3
2017-01-06 4
2017-01-09 5
2017-01-12 6
2017-01-13 7
> s.resample('B').ffill()
2017-01-02 0
2017-01-03 1
2017-01-04 2
2017-01-05 3
2017-01-06 4
2017-01-09 5
2017-01-10 5
2017-01-11 5
2017-01-12 6
2017-01-13 7
Output of pd.show_versions()
I have tried this on both the latest stable version 0.20.2 and the master branch, Git revision 10c17d4, and both exhibit the same problem.
pandas: 0.21.0.dev+136.g10c17d4
pytest: 3.0.1
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.19.0
xarray: None
IPython: 5.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.2.0-b1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
Stable version
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.10-100.fc24.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.utf8
LOCALE: en_GB.UTF-8
pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None