Skip to content

BUG: .unstack() with recarray column raises TypeError since 1.4.0 #49388

Open
@stefan-jansen

Description

@stefan-jansen

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd

c = np.array([2] * 20, dtype='f8')
r = np.rec.fromarrays([c], names=['c'])

df = pd.DataFrame({'a': np.arange(20) // 5, 'b': list('ABCDE') * 4, 'c': r})
# df.info() => see GH48526
df.set_index(['a', 'b']).c.unstack()

Issue Description

Starting with 1.4.0, Including a column of dtype np.record(related to #48526) as follows:

c = np.array([2] * 9, dtype='f8')
r = np.rec.fromarrays([c], names=['c'])
df = pd.DataFrame({'a': np.arange(9) // 3,
                   'b': list('ABC') * 3,
                   'c': r})
print(df.dtypes)

a                             int64
b                            object
c    (numpy.record, [('c', '<f8')])

raises a TypeError:

   df.set_index(['a', 'b']).unstack()
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/frame.py", line 9060, in unstack
    result = unstack(self, level, fill_value)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 479, in unstack
    return _unstack_frame(obj, level, fill_value=fill_value)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 508, in _unstack_frame
    return unstacker.get_result(
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 215, in get_result
    values, _ = self.get_new_values(values, fill_value)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 228, in get_new_values
    sorted_values = self._make_sorted_values(values)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 167, in _make_sorted_values
    sorted_values = algos.take_nd(values, indexer, axis=0)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/array_algos/take.py", line 118, in take_nd
    return _take_nd_ndarray(arr, indexer, axis, fill_value, allow_fill)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/array_algos/take.py", line 135, in _take_nd_ndarray
    dtype, fill_value, mask_info = _take_preprocess_indexer_and_fill_value(
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/array_algos/take.py", line 587, in _take_preprocess_indexer_and_fill_value
    dtype, fill_value = arr.dtype, arr.dtype.type()
TypeError: void() takes exactly 1 positional argument (0 given)

Expected Behavior

Before 1.4.0, the output of the same code shows that the recarray has dtype object and the unstack does not throw an error:

a     int64
b    object
c    object
dtype: object
        c                
b       A       B       C
a                        
0  (2.0,)  (2.0,)  (2.0,)
1  (2.0,)  (2.0,)  (2.0,)
2  (2.0,)  (2.0,)  (2.0,)

Installed Versions

INSTALLED VERSIONS

commit : 91111fd
python : 3.9.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-52-generic
Version : #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.1
numpy : 1.23.4
pytz : 2022.5
dateutil : 2.8.2
setuptools : 65.5.0
pip : 22.3
Cython : 0.29.32
pytest : 6.2.5
hypothesis : None
sphinx : 5.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : 1.3.5
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.1
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.3
snappy : None
sqlalchemy : 1.4.42
tables : 3.7.0
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugConstructorsSeries/DataFrame/Index/pd.array ConstructorsDataFrameDataFrame data structure

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions