Skip to content

Columns with bit/bytemask null representation should be able to return None for validity buffer when there is no missing values #90

Open
@AlenkaF

Description

@AlenkaF

I am currently working on the implementation of the dataframe interchange protocol for PyArrow. After testing the current PyArrow implementation for producing a __dataframe__ object with Pandas implementation for consuming I have noticed that columns that use bit/bytemask null representation, but do not have missing values, error.

The reason for this is that Apache Arrow does not create a mask buffer when there are no missing values present. Therefore the result of calling .get_buffers()["validity"] on the PyArrow __dataframe__ object without missing values is None which is currently not handled by the protocol specification. See:
https://github.com/pandas-dev/pandas/blob/5c66e65d7b9fef47ccb585ce2fd0b3ea18dc82ea/pandas/core/interchange/from_dataframe.py#L502

For now we are checking for columns without missing values and in that case describe that column as non-nullable. But we think there should be an option for nullable columns with bit/bytemasks null representation to return None instead of a buffer.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions