Skip to content

ENH: Support reading value labels for Stata formats 108 (Stata 6) and earlier #58154

Closed
@cmjcharlton

Description

@cmjcharlton

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently Pandas supports reading value labels for data files saved in 111 (Stata 7 SE) and later formats. It would be nice if this could be extended to all supported format versions.

Feature Description

This could be implemented by extending the function _read_value_labels in pandas/io/stata.py.

Value labels in the 108 format use the same structure as later versions, except that label names are restricted to 8 characters, plus a null terminator [1].

Values labels prior to the 108 format used a simple structure for each label containing a list of codes, followed by a list of 8 character strings corresponding to each code [2].

References:
[1] Description of the 108 .dta format, section 5.6 Value Labels (dta_108.txt)
[2] Description of the 105 .dta format, section 5.6 Value Labels (dta_105.txt)

Alternative Solutions

Currently the only way to import these labels is to open the file in another piece of software that does support reading them, and then save them to a more recent version for which Pandas has value label support.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions