Skip to content

read_spss doens't return the metadata #54264

Closed
@HadarEliyahu39

Description

@HadarEliyahu39

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

`def read_spss(
path: str | Path,
usecols: Sequence[str] | None = None,
convert_categoricals: bool = True,
) -> DataFrame:
"""
Load an SPSS file from the file path, returning a DataFrame.

.. versionadded:: 0.25.0

Parameters
----------
path : str or Path
    File path.
usecols : list-like, optional
    Return a subset of the columns. If None, return all columns.
convert_categoricals : bool, default is True
    Convert categorical columns into pd.Categorical.

Returns
-------
DataFrame
"""
pyreadstat = import_optional_dependency("pyreadstat")

if usecols is not None:
    if not is_list_like(usecols):
        raise TypeError("usecols must be list-like.")
    else:
        usecols = list(usecols)  # pyreadstat requires a list

df, _ = pyreadstat.read_sav(
    stringify_path(path), usecols=usecols, apply_value_formats=convert_categoricals
)
return df`

pandas has this function, which uses pyreadstat to read 'sav' files and returns a dataset back, but for some reason it ignores the metadata, and I guess some may use this, otherwise why using spss at first place

Feature Description

Return the metadata along with the dataframe

Alternative Solutions

`def read_spss(
path: str | Path,
usecols: Sequence[str] | None = None,
convert_categoricals: bool = True,
) -> DataFrame:
"""
Load an SPSS file from the file path, returning a DataFrame.

.. versionadded:: 0.25.0

Parameters
----------
path : str or Path
    File path.
usecols : list-like, optional
    Return a subset of the columns. If None, return all columns.
convert_categoricals : bool, default is True
    Convert categorical columns into pd.Categorical.

Returns
-------
DataFrame
"""
pyreadstat = import_optional_dependency("pyreadstat")

if usecols is not None:
    if not is_list_like(usecols):
        raise TypeError("usecols must be list-like.")
    else:
        usecols = list(usecols)  # pyreadstat requires a list

df, metadata = pyreadstat.read_sav(
    stringify_path(path), usecols=usecols, apply_value_formats=convert_categoricals
)
return df, metadata`

Additional Context

No response

Metadata

Metadata

Assignees

Labels

EnhancementIO DataIO issues that don't fit into a more specific labelmetadata_metadata, .attrs

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions