Skip to content

BUG: Interchange protocol fails with pyarrow backed types #52323

Closed
@char101

Description

@char101

Pandas version checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd, pyarrow as pa
from datetime import datetime

pa.interchange.from_dataframe(pd.DataFrame({'a': [datetime.now()]}, dtype='date32[pyarrow]'))
NotImplementedError                       Traceback (most recent call last)
Cell In [31], line 1
----> 1 pl.from_dataframe(pd.DataFrame({'a': [datetime.now()]}, dtype='date32[pyarrow]'))

File C:\python\3.11\Lib\site-packages\pyarrow\interchange\from_dataframe.py:85, in from_dataframe(df, allow_copy)
     82 if not hasattr(df, "__dataframe__"):
     83     raise ValueError("`df` does not support __dataframe__")
---> 85 return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
     86                        allow_copy=allow_copy)

File C:\python\3.11\Lib\site-packages\pyarrow\interchange\from_dataframe.py:108, in _from_dataframe(df, allow_copy)
    106 batches = []
    107 for chunk in df.get_chunks():
--> 108     batch = protocol_df_chunk_to_pyarrow(chunk, allow_copy)
    109     batches.append(batch)
    111 table = pa.Table.from_batches(batches)

File C:\python\3.11\Lib\site-packages\pyarrow\interchange\from_dataframe.py:143, in protocol_df_chunk_to_pyarrow(df, allow_copy)
    141     raise ValueError(f"Column {name} is not unique")
    142 col = df.get_column_by_name(name)
--> 143 dtype = col.dtype[0]
    144 if dtype in (
    145     DtypeKind.INT,
    146     DtypeKind.UINT,
   (...)
    149     DtypeKind.DATETIME,
    150 ):
    151     columns[name] = column_to_array(col, allow_copy)

File C:\python\3.11\Lib\site-packages\pandas\_libs\properties.pyx:36, in pandas._libs.properties.CachedProperty.__get__()

File C:\python\3.11\Lib\site-packages\pandas\core\interchange\column.py:126, in PandasColumn.dtype(self)
    124     raise NotImplementedError("Non-string object dtypes are not supported yet")
    125 else:
--> 126     return self._dtype_from_pandasdtype(dtype)

File C:\python\3.11\Lib\site-packages\pandas\core\interchange\column.py:141, in PandasColumn._dtype_from_pandasdtype(self, dtype)
    137 if kind is None:
    138     # Not a NumPy dtype. Check if it's a categorical maybe
    139     raise ValueError(f"Data type {dtype} not supported by interchange protocol")
--> 141 return kind, dtype.itemsize * 8, dtype_to_arrow_c_fmt(dtype), dtype.byteorder

File C:\python\3.11\Lib\site-packages\pandas\core\interchange\utils.py:91, in dtype_to_arrow_c_fmt(dtype)
     88     resolution = re.findall(r"\[(.*)\]", typing.cast(np.dtype, dtype).str)[0][:1]
     89     return ArrowCTypes.TIMESTAMP.format(resolution=resolution, tz="")
---> 91 raise NotImplementedError(
     92     f"Conversion of {dtype} to Arrow C format string is not implemented."
     93 )

NotImplementedError: Conversion of date32[day][pyarrow] to Arrow C format string is not implemented.

Issue Description

Conversion of a dataframe containing date32 to pyarrow table throws an exception.

Expected Behavior

date32[day][pyarrow] should be able to be converted back to a pyarrow table.

Installed Versions

INSTALLED VERSIONS

commit : ceef0da
python : 3.11.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Indonesia.1252

pandas : 2.1.0.dev0+368.gceef0da443
numpy : 1.24.2
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.6.3
pip : 23.0.1
Cython : 0.29.32
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.9.5
jinja2 : 3.1.2
IPython : 8.6.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.7
brotli : 1.0.9
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.0
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : 1.4.45
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2022.6
qtpy : 2.3.0
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityBugInterchangeDataframe Interchange Protocol

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions