Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.DataFrame({'x': ['1', '2', 'x']}).to_csv('test.csv')
df = pd.read_csv('test.csv', engine='pyarrow', dtype_backend='pyarrow')
# this works
pd.to_numeric(df['x'], errors='coerce')
# this works
pd.to_numeric(df['x'].astype('str'), errors='coerce', dtype_backend='pyarrow')
# this crashes
pd.to_numeric(df['x'], errors='coerce', dtype_backend='pyarrow')
Issue Description
the call to to_numeric
crashes with the follow error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[385], line 9
7 pd.to_numeric(df['x'].astype('str'), errors='coerce', dtype_backend='pyarrow')
8 # this crashes
----> 9 pd.to_numeric(df['x'], errors='coerce', dtype_backend='pyarrow')
File ~/miniconda3/envs/mostly-data/lib/python3.9/site-packages/pandas/core/tools/numeric.py:279, in to_numeric(arg, errors, downcast, dtype_backend)
277 assert isinstance(mask, np.ndarray)
278 data = np.zeros(mask.shape, dtype=values.dtype)
--> 279 data[~mask] = values
281 from pandas.core.arrays import (
282 ArrowExtensionArray,
283 BooleanArray,
284 FloatingArray,
285 IntegerArray,
286 )
288 klass: type[IntegerArray] | type[BooleanArray] | type[FloatingArray]
ValueError: NumPy boolean array indexing assignment cannot assign 2 input values to the 3 output values where the mask is true
Expected Behavior
No crash, and same output as for pd.to_numeric(df['x'], errors='coerce')
Installed Versions
pandas : 2.0.0
numpy : 1.24.2
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.6.1
pip : 23.0.1
Cython : None
pytest : 7.2.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.6
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: None
bs4 : 4.12.1
bottleneck : None
brotli : None
fastparquet : 0.8.3
fsspec : 2023.3.0
gcsfs : 2023.3.0
matplotlib : 3.6.3
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 10.0.1
pyreadstat : None
pyxlsb : None
s3fs : 2023.3.0
scipy : None
snappy : None
sqlalchemy : 2.0.9
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None