Skip to content

BUG: failing tests in pyarrow with pandas nightly #52782

Closed
@AlenkaF

Description

@AlenkaF

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pyarrow as pa
import pandas as pd
from datetime import datetime

arr = pa.array([datetime(1, 1, 1)], pa.timestamp('s', tz='America/New_York'))
table = pa.table({'a': arr})

table.to_pandas(safe=False)
#                                     a
# 0 1754-08-30 17:47:41.128654848-04:56
table.column('a').to_pandas(safe=False)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "pyarrow/array.pxi", line 837, in pyarrow.lib._PandasConvertible.to_pandas
#     return self._to_pandas(options, categories=categories,
#   File "pyarrow/table.pxi", line 469, in pyarrow.lib.ChunkedArray._to_pandas
#     arr = pandas_dtype.__from_arrow__(self)
#   File "/Users/alenkafrim/repos/pyarrow-dev/lib/python3.10/site-packages/pandas/core/dtypes/dtypes.py", line 848, in __from_arrow__
#     array = array.cast(pyarrow.timestamp(unit=self._unit), safe=True)
#   File "pyarrow/table.pxi", line 551, in pyarrow.lib.ChunkedArray.cast
#     return _pc().cast(self, target_type, safe=safe, options=options)
#   File "/Users/alenkafrim/repos/arrow/python/pyarrow/compute.py", line 400, in cast
#     return call_function("cast", [arr], options, memory_pool)
#   File "pyarrow/_compute.pyx", line 572, in pyarrow._compute.call_function
#     return func.call(args, options=options, memory_pool=memory_pool,
#   File "pyarrow/_compute.pyx", line 367, in pyarrow._compute.Function.call
#     result = GetResultValue(
#   File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
#     return check_status(status)
#   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
#     raise ArrowInvalid(message)
# pyarrow.lib.ArrowInvalid: Casting from timestamp[s, tz=America/New_York] to timestamp[ns] would result in out of bounds timestamp: -62135596800
# /Users/alenkafrim/repos/arrow/cpp/src/arrow/compute/exec.cc:920  kernel_->exec(kernel_ctx_, input, &output)
# /Users/alenkafrim/repos/arrow/cpp/src/arrow/compute/function.cc:276  executor->Execute(input, &listener)

arr = pa.array([1, 2, 3], pa.timestamp("ms", "Europe/Brussels"))
table = pa.table({"name": arr})
result = table.column("name").to_pandas()
result.name
# is None, but should be "name"

Issue Description

Some tests in PyArrow started failing about a day ago apache/arrow#35235. The conversion from pyarrow table to pandas for timestamps with timezones is working fine, but column conversion to Series fails if timezone is not None.

The tests are failing on the Arrow code freeze branch made at the beginning of this week. The same tests pass with latest pandas release version, but fail with the development version/ nightly.

We suspect this could be connected with #52677 but are not sure.

@mroeschke do you maybe have an idea of what could be triggering this?

Expected Behavior

The conversion from table column to pandas should not fail.

Installed Versions

INSTALLED VERSIONS

commit : d91b04c
python : 3.10.10.final.0
python-bits : 64
OS : Darwin
OS-release : 22.4.0
Version : Darwin Kernel Version 22.4.0: Mon Mar 6 21:00:41 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T8103
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.1.0.dev0+548.gd91b04c30e
numpy : 1.21.6
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 67.2.0
pip : 23.0.1
Cython : 0.29.33
pytest : 7.2.2
hypothesis : 6.70.0
sphinx : 5.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.11.0
pandas_datareader: None
bs4 : 4.12.0
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2023.3.0
gcsfs : None
matplotlib : 3.7.1
numba : 0.56.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 12.0.0.dev418+g6432a2382e.d20230414
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : 2023.2.0
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityBugNeeds InfoClarification about behavior needed to assess issueUpstream issueIssue related to pandas dependency

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions