Skip to content

str.cat should return categorical data for categorical caller #20845

Open
@h-vetinari

Description

@h-vetinari

The str.cat-accessor works for Series and Index, and returns an object of the corresponding type:

s = pd.Series(['a', 'b', 'a'])
t = pd.Index(['a', 'b', 'a'])
## all of the following return the same Series
s.str.cat(s)
s.str.cat(t)
s.str.cat(s.values)
s.str.cat(list(s))
# 0    aa
# 1    bb
# 2    aa
# dtype: object

## all of the following return the same Index
t.str.cat(s)
t.str.cat(t)
t.str.cat(s.values)
t.str.cat(list(s))
# Index(['aa', 'bb', 'aa'], dtype='object')

But the data loses its property of being a category after str.cat, which is inconsistent, IMO

sc = s.astype('category')
tc = pd.Index(['a', 'b', 'a'], dtype='category') # conversion does not work, see #20843
sc.str.cat(s)
# 0    aa
# 1    bb
# 2    aa
# dtype: object
## as opposed to:
sc.str.cat(s).astype('category')
# 0    aa
# 1    bb
# 2    aa
# dtype: category
# Categories (2, object): [aa, bb]
tc.str.cat(s) # crashes, see # 20842

xref #20842 #20843

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeEnhancementNeeds DiscussionRequires discussion from core team before further actionStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions