Skip to content

BUG: test_str_encode[utf32] fails on big-endian machine #57373

Closed
@QuLogic

Description

@QuLogic

The test_str_encode test here:

@pytest.mark.parametrize("errors", ["ignore", "strict"])
@pytest.mark.parametrize(
"encoding, exp",
[
["utf8", b"abc"],
["utf32", b"\xff\xfe\x00\x00a\x00\x00\x00b\x00\x00\x00c\x00\x00\x00"],
],
)
def test_str_encode(errors, encoding, exp):
ser = pd.Series(["abc", None], dtype=ArrowDtype(pa.string()))
result = ser.str.encode(encoding, errors)
expected = pd.Series([exp, None], dtype=ArrowDtype(pa.binary()))
tm.assert_series_equal(result, expected)

appears to encode to native byte order, but the expected value b"\xff\xfe\x00\x00a\x00\x00\x00b\x00\x00\x00c\x00\x00\x00" is given in little-endian order.

This causes the test to fail on big-endian systems such as s390x:

E   AssertionError: Series are different
E   
E   Series values are different (100.0 %)
E   [index]: [0, 1]
E   [left]:  [b'\x00\x00\xfe\xff\x00\x00\x00a\x00\x00\x00b\x00\x00\x00c']
E   [right]: [b'\xff\xfe\x00\x00a\x00\x00\x00b\x00\x00\x00c\x00\x00\x00']
E   At positional index 0, first diff: b'\x00\x00\xfe\xff\x00\x00\x00a\x00\x00\x00b\x00\x00\x00c' != b'\xff\xfe\x00\x00a\x00\x00\x00b\x00\x00\x00c\x00\x00\x00'
testing.pyx:173: AssertionError

INSTALLED VERSIONS

commit : f538741
python : 3.12.2.final.0
python-bits : 64
OS : Linux
OS-release : 6.6.11-200.fc39.x86_64
Version : #1 SMP PREEMPT_DYNAMIC Wed Jan 10 19:25:59 UTC 2024
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : C.UTF-8

pandas : 2.2.0
numpy : 1.26.2
pytz : 2024.1
dateutil : 2.8.2
setuptools : 69.0.3
pip : 23.3.2
Cython : 3.0.8
pytest : 7.4.3
hypothesis : 6.96.1
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : 3.1.9
lxml.etree : 5.1.0
html5lib : 1.1
pymysql : 1.4.6
psycopg2 : 2.9.9
jinja2 : 3.1.3
IPython : 8.21.0
pandas_datareader : 0.10.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : 1.3.7
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.2.0
gcsfs : 2023.6.0+1.g7cc53d9
matplotlib : 3.8.2
numba : None
numexpr : 2.8.5
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 15.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.3
sqlalchemy : 2.0.25
tables : 3.9.2
tabulate : 0.9.0
xarray : 2023.8.0
xlrd : 2.0.1
zstandard : 0.22.0
tzdata : None
qtpy : 2.4.1
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityTestingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions