Closed
Description
I ran into this issue today, and it seems like it should be a fairly common situation. I have imported two dataframes (using pandas.read_stata
) of categorical data that I want to concatenate. One of them might not have an instance of every category that the other one has, so pandas won't concatenate. It seems like it would be more flexible if it could add all missing categories.
I know that this inflexibility is in the documentation, but I wonder why it exists. Is there a good reason why pandas shouldn't automatically append new categories as they are encountered?
In
s = pd.Series(['a', 'b'], dtype="category")
s2 = pd.Series(['a', 'c'], dtype="category")
s.append(s2)
Expected Output
0 a
1 b
0 a
1 c
dtype: category
Categories (3, object): [a, b, c]
Actual Output
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-40-e1bb85501be1> in <module>()
4 s = pd.Series(['a', 'b'], dtype="category")
5 s2 = pd.Series(['a', 'c'], dtype="category")
----> 6 s.append(s2)
/Users/will/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in append(self, to_append, verify_integrity)
1575 to_concat = [self, to_append]
1576 return concat(to_concat, ignore_index=False,
-> 1577 verify_integrity=verify_integrity)
1578
1579 def _binop(self, other, func, level=None, fill_value=None):
/Users/will/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
833 verify_integrity=verify_integrity,
834 copy=copy)
--> 835 return op.get_result()
836
837
/Users/will/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in get_result(self)
979 # stack blocks
980 if self.axis == 0:
--> 981 new_data = com._concat_compat([x._values for x in self.objs])
982 name = com._consensus_name_attr(self.objs)
983 return (Series(new_data, index=self.new_axes[0], name=name)
/Users/will/anaconda/lib/python2.7/site-packages/pandas/core/common.pyc in _concat_compat(to_concat, axis)
2722 elif 'category' in typs:
2723 from pandas.core.categorical import _concat_compat
-> 2724 return _concat_compat(to_concat, axis=axis)
2725
2726 if not nonempty:
/Users/will/anaconda/lib/python2.7/site-packages/pandas/core/categorical.pyc in _concat_compat(to_concat, axis)
1948 for x in categoricals[1:]:
1949 if not categories.is_dtype_equal(x):
-> 1950 raise ValueError("incompatible categories in categorical concat")
1951
1952 # we've already checked that all categoricals are the same, so if their
ValueError: incompatible categories in categorical concat
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.0
nose: 1.3.4
pip: 8.1.0
setuptools: 20.2.2
Cython: 0.21
numpy: 1.10.2
scipy: 0.16.1
statsmodels: 0.5.0
xarray: None
IPython: 4.0.1
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.5.0
pytz: 2016.1
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.5.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.32.1