Tracking issue: dataframe protocol implementation

The bulk of the dataframe interchange protocol was done in gh-38. There were still a number of TODOs however, and more will likely pop up once we have multiple implementations so we can actually turn one type of dataframe into another type. This is the tracking issue for those TODOs and issues:

- [ ] Categorical dtypes: we should allow having `null` as a category; it should not have a specified meaning, it's just another category that should (e.g.) roundtrip correctly. See conversation in 8 Apr meeting.
- [ ] Categorical dtypes: should they be a dtype in themselves, or should they be a part of the dtype tuple? Currently dtype is `(kind, bitwidth, format_str, endianness)`, with categorical being a value of the `kind` enum. Is making a 5th element in the dtype, with that element being another dtype 4-tuple, thereby allowing for nesting, sensible?
- [x] Add a `metadata` attribute that can be used to store library-specific things. For example, Vaex should be able to store expressions for its virtual columns there. _See PR gh-43_
- [x] Add a flag to throw an exception if the export cannot be zero-copy. (e.g. for pandas, possible due to block manager where rows are contiguous and columns are not - add a test for that). _See PR gh-44_
- [x] Add a string dtype, with variable-length strings implemented with the same scheme as Arrow uses (an `offsets` and a `data` buffer, see https://github.com/data-apis/dataframe-api/pull/38#discussion_r609818874). _See PR gh-45
- [x] Signature of the `from_dataframe` protocol? See https://github.com/data-apis/dataframe-api/issues/42 and meeting of 20 May.
- [x] What can be reused between implementations in different libraries, and can/should we have a reference implementation? --> question needs answering somewhere.
- [ ] What is the ownership for buffers, who owns the memory? This should be clearly spelled out in the docs. An `owner` attribute is perhaps needed. See meeting minutes 4 March, https://github.com/data-apis/dataframe-api/issues/39, and comments on this PR.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tracking issue: dataframe protocol implementation #46

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tracking issue: dataframe protocol implementation #46

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions