Description
Code Sample, a copy-pastable example if possible
In [1]: from collections import OrderedDict
In [2]: import pandas as pd
In [3]: pd.__version__
Out[3]: u'0.21.0'
# the following works as expected:
In [4]: df1 = pd.DataFrame([[1, 2, 3]], columns=['a', 'b', 'c'])
In [5]: df2 = pd.DataFrame(columns=['a', 'b', 'c'])
In [6]: pd.concat([df1, df2])
Out[6]:
a b c
0 1 2 3
# however, the following seems like it does an identical thing but throws an error:
In [7]: od3 = OrderedDict([('a', [1]), ('b', [2]), ('c', [3])])
In [8]: od4 = OrderedDict([('a', []), ('b', []), ('c', [])])
In [9]: df3 = pd.DataFrame(od3)
In [10]: df4 = pd.DataFrame(od4)
In [11]: pd.concat([df3, df4])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-ac4fae34c928> in <module>()
----> 1 pd.concat([df3, df4])
#...
ValueError: Shape of passed values is (3, 1), indices imply (3, 0)
Rewriting it in copy/pastable form...
This works:
df1 = pd.DataFrame([[1, 2, 3]], columns=['a', 'b', 'c'])
df2 = pd.DataFrame(columns=['a', 'b', 'c'])
pd.concat([df1, df2])
And this does not:
od3 = OrderedDict([('a', [1]), ('b', [2]), ('c', [3])])
od4 = OrderedDict([('a', []), ('b', []), ('c', [])])
df3 = pd.DataFrame(od3)
df4 = pd.DataFrame(od4)
pd.concat([df3, df4])
Problem description
This is also described here.
In the above code, df1
equals df3
, and df2
and df4
are both empty dataframes with the same column names (although, strangely, df2
and df4
aren't equal according to df2.equals(df4)
) but pd.concat([df1, df2])
works while pd.concat([df3, df4])
results in ValueError
. This did not happen in previous versions of Pandas, but when I upgraded to 0.21.0, it started happening.
Oddly, as the stackoverflow link notes, using the drop_duplicates()
method on either df3
or df4
(or both) results in the concat()
working, even though neither of them contains any duplicates.
Expected Output
The expected output of pd.concat([df3, df4])
is
a b c
0 1 2 3
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.21.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 32.1.0
Cython: 0.23.4
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.14
pymysql: 0.7.11.None
psycopg2: 2.7.3.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0