Skip to content

Restrictions on column labels #7

Closed
@TomAugspurger

Description

@TomAugspurger

One of the uncontroversial points from #2 is that DataFrames have column labels / names. I'd like to discuss two specific points on this before merging the results into that issue.

  1. What type can the column labels be? Should they be limited to just strings?
  2. Do we require uniqueness of column labels?

I'm a bit unsure whether these are getting too far into the implementation side of things. Should we just take no stance on either of these?


My responses:

  1. We should probably labels to be any type.

Operations like crosstab / pivot places a column from the input dataframe into the column labels of the output.

We'll need to be careful with how this interacts with the indexing API, since a label like the tuple ('my', 'label') might introduce ambiguities (e.g. the full list of labels is ['my', 'label', ('my', 'label')].

Is it reasonable to require each label to be hashable? Pandas requires this, to facilitate lookup in a hashtable.

  1. We cannot require uniqueness.

dataframes are commonly used to wrangle real-world data into shape, and real-world data is messy. If an implementation wants to ensure uniqueness (perhaps on a per-object basis) then is can offer that separately. But the API should at least allow for it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions