Skip to content

Appending categorical data should be more flexible #12699

Closed
@wtadler

Description

@wtadler

I ran into this issue today, and it seems like it should be a fairly common situation. I have imported two dataframes (using pandas.read_stata) of categorical data that I want to concatenate. One of them might not have an instance of every category that the other one has, so pandas won't concatenate. It seems like it would be more flexible if it could add all missing categories.

I know that this inflexibility is in the documentation, but I wonder why it exists. Is there a good reason why pandas shouldn't automatically append new categories as they are encountered?

In

s = pd.Series(['a', 'b'], dtype="category")
s2 = pd.Series(['a', 'c'], dtype="category")
s.append(s2)

Expected Output

0    a
1    b
0    a
1    c
dtype: category
Categories (3, object): [a, b, c]

Actual Output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-40-e1bb85501be1> in <module>()
      4 s = pd.Series(['a', 'b'], dtype="category")
      5 s2 = pd.Series(['a', 'c'], dtype="category")
----> 6 s.append(s2)

/Users/will/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in append(self, to_append, verify_integrity)
   1575             to_concat = [self, to_append]
   1576         return concat(to_concat, ignore_index=False,
-> 1577                       verify_integrity=verify_integrity)
   1578 
   1579     def _binop(self, other, func, level=None, fill_value=None):

/Users/will/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    833                        verify_integrity=verify_integrity,
    834                        copy=copy)
--> 835     return op.get_result()
    836 
    837 

/Users/will/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in get_result(self)
    979             # stack blocks
    980             if self.axis == 0:
--> 981                 new_data = com._concat_compat([x._values for x in self.objs])
    982                 name = com._consensus_name_attr(self.objs)
    983                 return (Series(new_data, index=self.new_axes[0], name=name)

/Users/will/anaconda/lib/python2.7/site-packages/pandas/core/common.pyc in _concat_compat(to_concat, axis)
   2722     elif 'category' in typs:
   2723         from pandas.core.categorical import _concat_compat
-> 2724         return _concat_compat(to_concat, axis=axis)
   2725 
   2726     if not nonempty:

/Users/will/anaconda/lib/python2.7/site-packages/pandas/core/categorical.pyc in _concat_compat(to_concat, axis)
   1948     for x in categoricals[1:]:
   1949         if not categories.is_dtype_equal(x):
-> 1950             raise ValueError("incompatible categories in categorical concat")
   1951 
   1952     # we've already checked that all categoricals are the same, so if their

ValueError: incompatible categories in categorical concat

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.4
pip: 8.1.0
setuptools: 20.2.2
Cython: 0.21
numpy: 1.10.2
scipy: 0.16.1
statsmodels: 0.5.0
xarray: None
IPython: 4.0.1
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.5.0
pytz: 2016.1
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.5.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.32.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignCategoricalCategorical Data TypeReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions