Restrictions on column labels

One of the uncontroversial points from https://github.com/pydata-apis/dataframe-api/issues/2 is that DataFrames have column labels / names. I'd like to discuss two specific points on this before merging the results into that issue.

1. What type can the column labels be? Should they be limited to just strings?
2. Do we require uniqueness of column labels?

I'm a bit unsure whether these are getting too far into the implementation side of things. Should we just take no stance on either of these?

---

My responses:

1. We should probably labels to be any type.

Operations like `crosstab` / `pivot` places a column from the input dataframe into the column labels of the output.

We'll need to be careful with how this interacts with the indexing API, since a label like the tuple `('my', 'label')` might introduce ambiguities (e.g. the full list of labels is `['my', 'label', ('my', 'label')]`.

Is it reasonable to require each label to be hashable? Pandas requires this, to facilitate lookup in a hashtable.


2. We cannot require uniqueness.

dataframes are commonly used to wrangle real-world data into shape, and real-world data is messy. If an implementation wants to ensure uniqueness (perhaps on a per-object basis) then is can offer that separately. But the API should at least allow for it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restrictions on column labels #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Restrictions on column labels #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions