Description
I use both Stata and Pandas. Many Stata users save variable labels to describe the columns in a clearer way than the names. Running this in Stata
sysuse auto.dta
describe
gives something like
variable name | storage type | variable label |
---|---|---|
make | str18 | Make and Model |
price | int | Price |
mpg | int | Mileage (mpg) |
For me (maybe for others too) it would be useful to have an optional field in a DataFrame with a column label dictionary. The keys would be the columns (not necessarily all of them) and the values the string labels.
This is used in the pandas.io.stata.StataReader field variable_labels
(see the docs], that allows you to import these labels when one reads in a Stata .dta
file.
I know I could just carry around a dictionary with this information, but I think it's cleaner and less error prone to set it and save it within a DataFrame.
Additionally, storing this would allow doing a cycle on Stata/Pandas without loss of information, since the to_stata
would check if this field exists. (to_stata
might already have the option to pass the variable_labels
dictionary as an option, but I didn't see it documented at least)
My coding prowess is quite limited, but I'd be happy to at least write test code and help out if somebody starts out.