Description
Problem description
Until an hour ago I thought that I can safely omit the use of inplace = True (which is not recommended, e.g., #30484) and instead use inplace = False and directly assign the result.
For example, I thought that
df.drop(columns = ['a'], inplace = True)
can be replaced by
df = df.drop(columns = ['a'])
where df is a pd.DataFrame.
However, this does not seem to be case when I use these operations within a function.
# 1) direct assignment, .drop
df = pd.DataFrame([1])
def tfun(df):
df['a'] = 2
df = df.drop(columns = ['a'])
tfun(df)
print(df.columns)
>>> Index([0, 'a'], dtype='object')
Compare this with
# 2) direct assignment, .drop with inplace (or del)
df = pd.DataFrame([1])
def tfun(df):
df['a'] = 2
df.drop(columns = ['a'], inplace = True) # using del df['a'] leads to the same result
tfun(df)
print(df.columns)
>>> Index([0], dtype='object')
The result of 2) is as expected (we add column 'a' and immediately remove it). However, I am very confused about the result of 1). The removal of column 'a' which is done inside tfun is not reflected in df outside after tfun is applied.
It gets even stranger when we use .assign to add column 'a' to df inside tfun:
# 3) .assign, .drop
df= pd.DataFrame([1])
def tfun(df):
df= df.assign(a = 2)
df= df.drop(columns = ['a'])
tfun(df)
print(df.columns)
>>> RangeIndex(start=0, stop=1, step=1)
Now, column 'a' is removed, although the type of the remaining column is now a RangeIndex.
I definitely would expect that the result of 3) is equal to the result of 1). What is going on here?
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : de_DE.UTF-8
LOCALE : None.None
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : 0.29.15
pytest : 5.3.5
hypothesis : 5.5.4
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.48.0