Skip to content

ENH: Series.struct accessor with Series.struct.field("sub-column name") for ArrowDtype #54938

Closed
@tswast

Description

@tswast

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

When I have a Series of type ArrowDtype(struct(...)), I'd like to be able to extract sub-fields from them.

For example, I have a pandas Series with the ArrowDtype(pyarrow.struct([("int_col", pyarrow.int64()), ("string_col", pyarrow.string())])). I'd like to extract just the int_col field from this Series as another Series.

Feature Description

Add a struct accessor which is accessible from Series with ArrowDtype(struct(...)). This struct accessor provides a field() method which returns a Series containing only the specified sub-field.

series = pandas.Series(struct_array, dtype=pandas.ArrowDtype(struct_type))

int_series = series.struct.field("int_col")

Alternative Solutions

I can currently do this via pyarrow.compute.struct_field on the underlying pyarrow array:

import pyarrow
struct_type = pyarrow.struct([
    ("int_col", pyarrow.int64()),
    ("string_col", pyarrow.string()),
])
struct_array = pyarrow.array([
    {"int_col": 1, "string_col": "a"},
    {"int_col": 2, "string_col": "b"},
    {"int_col": 3, "string_col": "c"},
], type=struct_type)

import pandas
series = pandas.Series(struct_array, dtype=pandas.ArrowDtype(struct_type))

int_col_index = struct_array.type.get_field_index("int_col")
int_col_series = pandas.Series(
    pyarrow.compute.struct_field(struct_array, [int_col_index]),
    dtype=pandas.ArrowDtype(struct_array.type[int_col_index].type))

Additional Context

This issue is particularly relevant when working with data sources that support struct fields, such as BigQuery or Parquet.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Accessorsaccessor registration mechanism (not .str, .dt, .cat)Arrowpyarrow functionalityEnhancement

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions