Skip to content

unexpected behavior when assigning data to a multicolumn dataframe #5508

Closed
@twmr

Description

@twmr

Using the latest git version of pandas (705b677) I have exerienced the
following problems:

In [1]: a = pd.DataFrame(index=pd.Index(xrange(1,11)))

In [2]: a['foo'] = np.zeros(10, dtype=np.float)

In [3]: a['bar'] = np.zeros(10, dtype=np.complex)

In [4]: a.ix[2:5, 'bar']
Out[4]:
2    0j
3    0j
4    0j
5    0j
Name: bar, dtype: complex128

In [5]: a.ix[2:5, 'bar'] = np.array([2.33j, 1.23+0.1j, 2.2]) 
# invalid input (RHS has wrong size) -> does not throw an exception! 
# (The reason why no exception is thrown is because of the different 
# dtype of a['foo'] - see ``In[9]-In[10]``

In [6]: a
Out[6]:
    foo    bar
1     0     0j
2     0  2.33j
3     0  2.33j
4     0  2.33j
5     0  2.33j
6     0     0j
7     0     0j
8     0     0j
9     0     0j
10    0     0j

In [7]: a.ix[2:5, 'bar'] = np.array([2.33j, 1.23+0.1j, 2.2, 1.0]) # valid

In [8]: a
Out[8]:
    foo          bar
1     0           0j
2     0        2.33j
3     0  (1.23+0.1j)
4     0     (2.2+0j)
5     0       (1+0j)
6     0           0j
7     0           0j
8     0           0j
9     0           0j
10    0           0j


In [9]: a = pd.DataFrame(index=pd.Index(xrange(1,11)))

In [10]: a['bar'] = np.zeros(10, dtype=np.complex)

In [11]: a.ix[2:5, 'bar'] = np.array([2.33j, 1.23+0.1j, 2.2]) 
# invalid RHS-> exception raised  OK!
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-bde45910cde6> in <module>()
----> 1 a.ix[2:5, 'bar'] = np.array([2.33j, 1.23+0.1j, 2.2])

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/indexing.pyc in __setitem__(self, key, value)
     92             indexer = self._convert_to_indexer(key, is_setter=True)
     93
---> 94         self._setitem_with_indexer(indexer, value)
     95
     96     def _has_valid_type(self, k, axis):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/indexing.pyc in _setitem_with_indexer(self, indexer, value)
    387                 value = self._align_panel(indexer, value)
    388
--> 389             self.obj._data = self.obj._data.setitem(indexer,value)
    390             self.obj._maybe_update_cacher(clear=True)
    391

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in setitem(self, *args, **kwargs)
   2182
   2183     def setitem(self, *args, **kwargs):
-> 2184         return self.apply('setitem', *args, **kwargs)
   2185
   2186     def putmask(self, *args, **kwargs):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in apply(self, f, *args, **kwargs)
   2162
   2163             else:
-> 2164                 applied = getattr(blk, f)(*args, **kwargs)
   2165
   2166             if isinstance(applied, list):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in setitem(self, indexer, value)
    580         try:
    581             # set and return a block
--> 582             values[indexer] = value
    583
    584             # coerce and try to infer the dtypes of the result

ValueError: could not broadcast input array from shape (3) into shape (4)

Here is the 2nd problem:

In [1]: b = pd.DataFrame(index=pd.Index(xrange(1,11)))

In [2]: b['foo'] = np.zeros(10, dtype=np.float)

In [3]: b['bar'] = np.zeros(10, dtype=np.complex)

In [4]: b
Out[4]:
    foo  bar
1     0   0j
2     0   0j
3     0   0j
4     0   0j
5     0   0j
6     0   0j
7     0   0j
8     0   0j
9     0   0j
10    0   0j

In [5]: b[2:5]
Out[5]:
   foo  bar
3    0   0j
4    0   0j
5    0   0j

In [6]: b[2:5] = np.arange(1,4)*1j 
# invalid input (wrong size on RHS)

In [7]: b
Out[7]:
    foo  bar
1    0j   0j
2    0j   0j
3    1j   2j
4    1j   2j
5    1j   2j
6    0j   0j
7    0j   0j
8    0j   0j
9    0j   0j
10   0j   0j

# why does the expression in ``In [6]`` change the dtype of b['foo']. 
# Is this intended ?

In [8]: b[2:5] = np.arange(1,4)*1j # invalid input (wrong size on RHS)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-a729140e3f7f> in <module>()
----> 1 b[2:5] = np.arange(1,4)*1j # invalid input (wrong size on RHS)

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in __setitem__(self, key, value)
   1831         indexer = _convert_to_index_sliceable(self, key)
   1832         if indexer is not None:
-> 1833             return self._setitem_slice(indexer, value)
   1834
   1835         if isinstance(key, (Series, np.ndarray, list)):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _setitem_slice(self, key, value)
   1842
   1843     def _setitem_slice(self, key, value):
-> 1844         self.ix._setitem_with_indexer(key, value)
   1845
   1846     def _setitem_array(self, key, value):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/indexing.pyc in _setitem_with_indexer(self, indexer, value)
    387                 value = self._align_panel(indexer, value)
    388
--> 389             self.obj._data = self.obj._data.setitem(indexer,value)
    390             self.obj._maybe_update_cacher(clear=True)
    391

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in setitem(self, *args, **kwargs)
   2182
   2183     def setitem(self, *args, **kwargs):
-> 2184         return self.apply('setitem', *args, **kwargs)
   2185
   2186     def putmask(self, *args, **kwargs):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in apply(self, f, *args, **kwargs)
   2162
   2163             else:
-> 2164                 applied = getattr(blk, f)(*args, **kwargs)
   2165
   2166             if isinstance(applied, list):

/home/thomas/.local/lib/python2.7/site-packages/pandas-0.12.0_1098_g705b677-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in setitem(self, indexer, value)
    580         try:
    581             # set and return a block
--> 582             values[indexer] = value
    583
    584             # coerce and try to infer the dtypes of the result

ValueError: could not broadcast input array from shape (3) into shape (3,2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions