Description
Code Sample
import pandas as pd
print(pd.__version__) # '0.25.3'
# simple frame with a name attribute for columns
df = pd.DataFrame({'a': [1.1], 'b': [2.2]})
df.columns.name = 'x'
print(df)
# x a b
# 0 1.1 2.2
# tile rows to a new frame, apparently using the copy of the first frame
df2 = pd.concat([df] * 2, copy=True)
print(df2)
# x a b
# 0 1.1 2.2
# 0 1.1 2.2
# clear the column name for the first frame
df.columns.name = None
# and observe the name for the second
assert df2.columns.name is None
print(df2)
# a b
# 0 1.1 2.2
# 0 1.1 2.2
# aha, it's the same object
assert df.columns is df2.columns
Problem description
It is expected that copy=True
should make a copy of the columns
index. However, the demonstration shows it's the same object. Modifications, such as the name
property, are made to the source frame and the output frame created by pd.concat
.
Expected Output
It is expected that df2.columns
is a copy of the original df.columns
. This would prevent (e.g.) changing the name
property for both frames.
A workaround is to manually copy this property after pd.concat
:
df2 = pd.concat([df] * 2, copy=True)
df2.columns = df2.columns.copy()
Output of pd.show_versions()
pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.1.post20191125
Cython : 0.29.14
pytest : 5.3.0
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : 1.2.6
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.11
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.6