ENH: Pandas StructDType

I already searched for a while for discussions of nested structures in Pandas, but I couldn't find anything corresponding.

#### Is your feature request related to a problem?

Currently, there is no way to work with arbitrary nested data types in Pandas.
In Spark and PyArrow, one can have StructTypes. In NumPy, we have Compound types.
However, when we try something like this:
```python
df = pd.DataFrame({'pos':[1,2,3], "val": ["a","b", "c"]})
struct = df.to_records(index=False).view(type=np.ndarray, dtype=list(df.dtypes.items()))
pd.Series(struct)
```
Then we only get an error:
```python
ValueError: Cannot construct a Series from an ndarray with compound dtype.  Use DataFrame instead.
```

This would be useful for a number of use cases:
- Direct mapping between PySpark and Pandas nested columns
- Easy creation of custom data types; type checking of nested columns
- Efficient serialization with Arrow

#### Describe the solution you'd like

My wish would be to have a generic Pandas StructDType that:
- can be composed of any other Pandas DType
- allows conversion to PyArrow StructType back and forth
- can be written to Parquet

A perfect example is the `IntervalDType`/`IntervalArray` that is already implemented in Pandas:
https://github.com/pandas-dev/pandas/blob/2198f51911dd5a5a0cc1c12c3f083b86c96bd33b/pandas/core/arrays/interval.py#L146
In my opinion, its implementation is a special case of a Struct dtype.
It also supports conversion to and from PyArrow (see https://github.com/pandas-dev/pandas/commit/2198f51911dd5a5a0cc1c12c3f083b86c96bd33b).
Therefore, by generalizing the IntervalDType to use any number of subtypes, we would have the StructDType implementation ready.

#### API breaking implications

`to_csv()`, etc. could have difficulties with storing nested data.
That's maybe a followup problem to solve.

#### Describe alternatives you've considered

One can try to construct the Series as a list of tuples.
However, this has two drawbacks:
- No type checking
- `to_parquet()` fails




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Pandas StructDType #40652

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Pandas StructDType #40652

Description

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions