Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Sorry, this is not easily reproducible as the dataframe interchange protocol for pyarrow is still work in progress but I think the error is quite clear:
import pyarrow as pa
table = pa.table({"a": [1, 2, 3, None]})
exchange_df = table.__dataframe__()
from pandas.core.interchange.from_dataframe import from_dataframe
from_dataframe(exchange_df)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/alenkafrim/repos/pyarrow-dev-9/lib/python3.9/site-packages/pandas/core/interchange/from_dataframe.py", line 53, in from_dataframe
return _from_dataframe(df.__dataframe__(allow_copy=allow_copy))
File "/Users/alenkafrim/repos/pyarrow-dev-9/lib/python3.9/site-packages/pandas/core/interchange/from_dataframe.py", line 74, in _from_dataframe
pandas_df = protocol_df_chunk_to_pandas(chunk)
File "/Users/alenkafrim/repos/pyarrow-dev-9/lib/python3.9/site-packages/pandas/core/interchange/from_dataframe.py", line 122, in protocol_df_chunk_to_pandas
columns[name], buf = primitive_column_to_ndarray(col)
File "/Users/alenkafrim/repos/pyarrow-dev-9/lib/python3.9/site-packages/pandas/core/interchange/from_dataframe.py", line 160, in primitive_column_to_ndarray
data = set_nulls(data, col, buffers["validity"])
File "/Users/alenkafrim/repos/pyarrow-dev-9/lib/python3.9/site-packages/pandas/core/interchange/from_dataframe.py", line 504, in set_nulls
null_pos = buffer_to_ndarray(valid_buff, valid_dtype, col.offset, col.size)
File "/Users/alenkafrim/repos/pyarrow-dev-9/lib/python3.9/site-packages/pandas/core/interchange/from_dataframe.py", line 395, in buffer_to_ndarray
raise NotImplementedError(f"Conversion for {dtype} is not yet supported.")
NotImplementedError: Conversion for (<DtypeKind.BOOL: 20>, 1, 'b', '=') is not yet supported.
Issue Description
I am currently working on implementing a dataframe interchange protocol for pyarrow.Table
in Apache Arrow project (apache/arrow#14613).
I am using pandas implementation to test that the produced __dataframe__
object can be correctly consumed.
When consuming a pyarrow.Table
with missing values I get an NotImplementedError
. The bitmasks, used by PyArrow to represent nulls in a given column, can not be converted.
But if I look at the code in from_dataframe.py:
pandas/pandas/core/interchange/from_dataframe.py
Lines 405 to 415 in 70121c7
I would think this is not intentional and that the _NP_DTYPES
should include {1: bool}
pandas/pandas/core/interchange/from_dataframe.py
Lines 23 to 28 in 70121c7
Expected Behavior
The bitmask can be converted to ndarray
by the current pandas implementation of the dataframe interchange protocol and the code below could work for missing values also:
>>> import pyarrow as pa
>>> table = pa.table({"a": [1, 2, 3, 4]})
>>> exchange_df = table.__dataframe__()
>>> exchange_df._df
pyarrow.Table
a: int64
----
a: [[1,2,3,4]]
>>> from pandas.core.interchange.from_dataframe import from_dataframe
>>> from_dataframe(exchange_df)
a
0 1
1 2
2 3
3 4
Installed Versions
INSTALLED VERSIONS
commit : 87cfe4e
python : 3.9.14.final.0
python-bits : 64
OS : Darwin
OS-release : 21.6.0
Version : Darwin Kernel Version 21.6.0: Thu Sep 29 20:13:46 PDT 2022; root:xnu-8020.240.7~1/RELEASE_ARM64_T8101
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.5.0
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : 0.29.28
pytest : 7.1.3
hypothesis : 6.39.4
sphinx : 4.3.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.1.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.02.0
gcsfs : 2022.02.0
matplotlib : 3.6.2
numba : 0.56.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 11.0.0.dev117+geeca8a4e3.d20221122
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : 2022.11.0
xlrd : None
xlwt : None
zstandard : None
tzdata : None