Skip to content

BUG: inserting a Categorical with the wrong length into a DataFrame is allowed #9374

Closed
@shoyer

Description

@shoyer

This leaves the DataFrame in a very weird/buggy state:

In [33]: cat = pd.Categorical.from_codes([0, 1, 1, 0, 1, 2], ['a', 'b', 'c'])

In [34]: df = pd.DataFrame()

In [35]: df['bar'] = range(10)

In [36]: df['foo'] = cat

In [37]: df
Out[37]:
   bar foo
0    0   a
1    1   b
2    2   b
3    3   a
4    4   b
5    5   c
6    6
7    7
8    8
9    9

In [38]: df.foo.shape
Out[38]: (6,)

I was expecting something like:

In [49]: df['foo'] = np.array(cat)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-49-2c5dea325c23> in <module>()
----> 1 df['foo'] = np.array(cat)

/Users/shoyer/dev/pandas/pandas/core/frame.py in __setitem__(self, key, value)
   2108         else:
   2109             # set column
-> 2110             self._set_item(key, value)
   2111
   2112     def _setitem_slice(self, key, value):

/Users/shoyer/dev/pandas/pandas/core/frame.py in _set_item(self, key, value)
   2185
   2186         self._ensure_valid_index(value)
-> 2187         value = self._sanitize_column(key, value)
   2188         NDFrame._set_item(self, key, value)
   2189

/Users/shoyer/dev/pandas/pandas/core/frame.py in _sanitize_column(self, key, value)
   2258         elif (isinstance(value, Index) or is_sequence(value)):
   2259             from pandas.core.series import _sanitize_index
-> 2260             value = _sanitize_index(value, self.index, copy=False)
   2261             if not isinstance(value, (np.ndarray, Index)):
   2262                 if isinstance(value, list) and len(value) > 0:

/Users/shoyer/dev/pandas/pandas/core/series.py in _sanitize_index(data, index, copy)
   2562
   2563     if len(data) != len(index):
-> 2564         raise ValueError('Length of values does not match length of '
   2565                          'index')
   2566

ValueError: Length of values does not match length of index

Tested on master.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions