Skip to content

Upsampling with resample('B').ffill() fails with "ValueError: Length mismatch" #16624

Closed
@AdamGleave

Description

@AdamGleave

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

# length-10 index
idx = pd.date_range('2017-01-02', '2017-01-13', freq='B')

def create_series(exclude):
  subset = pd.Index([idx[i] for i in range(len(idx)) if i not in exclude])
  return pd.Series(np.arange(len(subset)), index=subset)

TESTS = [
  [3, 4], # delete Thursday and Friday of first week - PASS
  [4, 5], # delete Friday first week and Monday second week - PASS
  [5, 6], # delete Monday and Tuesday of second week - PASS
  [6], # delete Tuesday of second week - PASS
  [7], # delete Wednesday of second week - PASS
  [6, 7], # delete Tuesday and Wednesday of second week - FAIL
]

for exclude in TESTS:
  print('TEST: {}'.format(exclude))
  s = create_series(exclude)
  print('IN:\n{}'.format(s))
  t = s.resample('D').ffill()  # always works
  u = s.resample('B').last()  # always works
  v = s.resample('B').ffill()  # fails on [6, 7]
  print('OUT:\n{}'.format(v))

Problem description

s.resample('B').ffill() raises an exception, "ValueError: Length mismatch: Expected axis has 8 elements, new values have 10 elements". This occurs when upsampling from an index containing a multi-day, mid-week gap, but not at many other positions in the week; I am not sure exactly what predicts whether this problem occurs. Other resampling methods, e.g. last(), work as expected. Furthermore, forward filling at a daily ('D') frequency works. I cannot see any reason why this particular case should not be supported, so I believe it is a bug.

Expected Output

I expect the last statement not to raise an exception, and instead to produce the same output as s.resample('D').ffill().resample('B').first(), i.e. I expect:

> s = create_series([6, 7)]
> s
2017-01-02    0
2017-01-03    1
2017-01-04    2
2017-01-05    3
2017-01-06    4
2017-01-09    5
2017-01-12    6
2017-01-13    7
> s.resample('B').ffill()
2017-01-02    0
2017-01-03    1
2017-01-04    2
2017-01-05    3
2017-01-06    4
2017-01-09    5
2017-01-10    5
2017-01-11    5
2017-01-12    6
2017-01-13    7

Output of pd.show_versions()

I have tried this on both the latest stable version 0.20.2 and the master branch, Git revision 10c17d4, and both exhibit the same problem.

# Development branch INSTALLED VERSIONS ------------------ commit: 10c17d4 python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.10.10-100.fc24.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.utf8 LOCALE: en_GB.UTF-8

pandas: 0.21.0.dev+136.g10c17d4
pytest: 3.0.1
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.19.0
xarray: None
IPython: 5.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.2.0-b1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Stable version

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.10-100.fc24.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.utf8
LOCALE: en_GB.UTF-8

pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions