Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import pandas as pd
c = np.array([2] * 20, dtype='f8')
r = np.rec.fromarrays([c], names=['c'])
df = pd.DataFrame({'a': np.arange(20) // 5, 'b': list('ABCDE') * 4, 'c': r})
# df.info() => see GH48526
df.set_index(['a', 'b']).c.unstack()
Issue Description
Starting with 1.4.0, Including a column of dtype np.record
(related to #48526) as follows:
c = np.array([2] * 9, dtype='f8')
r = np.rec.fromarrays([c], names=['c'])
df = pd.DataFrame({'a': np.arange(9) // 3,
'b': list('ABC') * 3,
'c': r})
print(df.dtypes)
a int64
b object
c (numpy.record, [('c', '<f8')])
raises a TypeError:
df.set_index(['a', 'b']).unstack()
File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/frame.py", line 9060, in unstack
result = unstack(self, level, fill_value)
File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 479, in unstack
return _unstack_frame(obj, level, fill_value=fill_value)
File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 508, in _unstack_frame
return unstacker.get_result(
File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 215, in get_result
values, _ = self.get_new_values(values, fill_value)
File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 228, in get_new_values
sorted_values = self._make_sorted_values(values)
File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 167, in _make_sorted_values
sorted_values = algos.take_nd(values, indexer, axis=0)
File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/array_algos/take.py", line 118, in take_nd
return _take_nd_ndarray(arr, indexer, axis, fill_value, allow_fill)
File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/array_algos/take.py", line 135, in _take_nd_ndarray
dtype, fill_value, mask_info = _take_preprocess_indexer_and_fill_value(
File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/array_algos/take.py", line 587, in _take_preprocess_indexer_and_fill_value
dtype, fill_value = arr.dtype, arr.dtype.type()
TypeError: void() takes exactly 1 positional argument (0 given)
Expected Behavior
Before 1.4.0, the output of the same code shows that the recarray
has dtype
object
and the unstack
does not throw an error:
a int64
b object
c object
dtype: object
c
b A B C
a
0 (2.0,) (2.0,) (2.0,)
1 (2.0,) (2.0,) (2.0,)
2 (2.0,) (2.0,) (2.0,)
Installed Versions
INSTALLED VERSIONS
commit : 91111fd
python : 3.9.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-52-generic
Version : #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.1
numpy : 1.23.4
pytz : 2022.5
dateutil : 2.8.2
setuptools : 65.5.0
pip : 22.3
Cython : 0.29.32
pytest : 6.2.5
hypothesis : None
sphinx : 5.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : 1.3.5
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.1
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.3
snappy : None
sqlalchemy : 1.4.42
tables : 3.7.0
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None