Skip to content

Variable labels as a dataframe field #11179

Closed
@cdagnino

Description

@cdagnino

I use both Stata and Pandas. Many Stata users save variable labels to describe the columns in a clearer way than the names. Running this in Stata

sysuse auto.dta
describe

gives something like

variable name storage type variable label
make str18 Make and Model
price int Price
mpg int Mileage (mpg)

For me (maybe for others too) it would be useful to have an optional field in a DataFrame with a column label dictionary. The keys would be the columns (not necessarily all of them) and the values the string labels.

This is used in the pandas.io.stata.StataReader field variable_labels(see the docs], that allows you to import these labels when one reads in a Stata .dta file.

I know I could just carry around a dictionary with this information, but I think it's cleaner and less error prone to set it and save it within a DataFrame.

Additionally, storing this would allow doing a cycle on Stata/Pandas without loss of information, since the to_stata would check if this field exists. (to_stata might already have the option to pass the variable_labels dictionary as an option, but I didn't see it documented at least)

My coding prowess is quite limited, but I'd be happy to at least write test code and help out if somebody starts out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions