Skip to content

BUG: CategoricalIndex.str.extractall fails #23555

Closed
@h-vetinari

Description

@h-vetinari

Found a bug in a corner case while working on expanded test coverage in #23167:

>>> # the following three have the same result
>>> pd.Series(['a', 'b', 'aa'], dtype='category').str.extractall(r'(a)')
>>> pd.Series(['a', 'b', 'aa']).str.extractall(r'(a)')
>>> pd.Index(['a', 'b', 'aa']).str.extractall(r'(a)')
         0
  match
0 0      a
2 0      a
  1      a
>>> pd.Index(['a', 'b', 'aa'], dtype='category').str.extractall(r'(a)')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Miniconda3\envs\pandas-dev\lib\site-packages\pandas\core\strings.py", line 2567, in extractall
    return str_extractall(self._orig, pat, flags=flags)
  File "C:\ProgramData\Miniconda3\envs\pandas-dev\lib\site-packages\pandas\core\strings.py", line 1012, in str_extractall
    is_mi = arr.index.nlevels > 1
AttributeError: 'CategoricalIndex' object has no attribute 'index'

So Series, categorical Series and Index work with extractall, but not CategoricalIndex.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeIndexingRelated to indexing on series/frames, not to indexes themselvesStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions