Skip to content

DataFrame.drop(Timestamp) on MultiIndex with NaT incorrectly drops the NaT row #18853

Closed
@jeremywhelchel

Description

@jeremywhelchel

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(
    index=pd.MultiIndex.from_tuples([('blah', pd.NaT)],
                                    names=['name', 'date']),
)
print df

Empty DataFrame
Columns: []
Index: [(blah, NaT)]

print df.drop(pd.Timestamp('2001'), level='date')

Empty DataFrame
Columns: []
Index: []

Problem description

Timestamp('2001') isn't actually in the index. The drop() call should raise an error. For example when just operating on a single-level datetime index it throws this error:

df = pd.DataFrame(index=[pd.NaT])
df.drop(pd.Timestamp('2001'))
ValueError: labels [Timestamp('2001-01-01 00:00:00')] not contained in axis

I would expect the same thing. Silently dropping the NaT value was causing a hard to find bug in my code.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-97-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.21.0
pytest: None
pip: None
setuptools: None
Cython: None
numpy: 1.11.1
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 2.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.5
feather: None
matplotlib: 1.5.2
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: 3.4.4
bs4: 4.5.3
html5lib: 1.0b8
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions