Skip to content

BUG: Inconsistent behavior of pd.DataFrame.drop  #33438

Closed
@FSpanhel

Description

@FSpanhel

Problem description
Until an hour ago I thought that I can safely omit the use of inplace = True (which is not recommended, e.g., #30484) and instead use inplace = False and directly assign the result.
For example, I thought that

df.drop(columns = ['a'], inplace = True)

can be replaced by

df = df.drop(columns = ['a']) 

where df is a pd.DataFrame.

However, this does not seem to be case when I use these operations within a function.

# 1) direct assignment, .drop
df = pd.DataFrame([1])
def tfun(df):
    df['a'] = 2
    df = df.drop(columns = ['a'])
tfun(df)
print(df.columns)
>>> Index([0, 'a'], dtype='object')

Compare this with

# 2) direct assignment, .drop with inplace (or del)
df = pd.DataFrame([1])
def tfun(df):
    df['a'] = 2
    df.drop(columns = ['a'], inplace = True) # using del df['a'] leads to the same result
tfun(df)
print(df.columns)
>>> Index([0], dtype='object')

The result of 2) is as expected (we add column 'a' and immediately remove it). However, I am very confused about the result of 1). The removal of column 'a' which is done inside tfun is not reflected in df outside after tfun is applied.

It gets even stranger when we use .assign to add column 'a' to df inside tfun:

# 3) .assign, .drop
df= pd.DataFrame([1])
def tfun(df):
    df= df.assign(a = 2)
    df= df.drop(columns = ['a'])
tfun(df)
print(df.columns)
>>> RangeIndex(start=0, stop=1, step=1)

Now, column 'a' is removed, although the type of the remaining column is now a RangeIndex.
I definitely would expect that the result of 3) is equal to the result of 1). What is going on here?

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : de_DE.UTF-8
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : 0.29.15
pytest : 5.3.5
hypothesis : 5.5.4
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.48.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions