Skip to content

BUG: Boolean 2d array for __setitem__ with DataFrame incorrect results #18582

Closed
@dhirschfeld

Description

@dhirschfeld

According to my understanding of how boolean indexing should work the below test should pass:

import numpy as np
from numpy.random import randn
import pandas as pd

df = pd.DataFrame(randn(10, 5))
df1 = df.copy(); df2 = df.copy()
INVALID = (df < -1)|(df > 1)
df1[INVALID.values] = NaN
df2[INVALID] = NaN
assert df1.equals(df2)

i.e. it shouldn't matter whether or not the boolean mask is a DataFrame or a plain ndarray

Problem description

As can be seen in the picture below, when indexing a DataFrame with a boolean ndarray, rather than
setting the individual True elements entire rows are being set if any value in the row is True.

i.e. what is happening is the result I would expect from df[INVALID.any(axis=1)] = NaN rather than df[INVALID] = NaN

image

This is different from how indexing an ndarray works and to me at least is very surprising and entirely unexpected. This behaviour was the cause of a very subtle bug in my code :(

Output of pd.show_versions()


INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.21.0
pytest: 3.2.5
pip: 9.0.1
setuptools: 36.7.2
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.7.1
xarray: 0.9.6
IPython: 6.2.1
sphinx: 1.6.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.0
openpyxl: 2.5.0b1
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsError ReportingIncorrect or improved errors from pandasIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions