Series StringMethods very slow

I understand the benefit of Series.str methods which automatically handle NA, but the implementation seems really slow.

```
>>> s = pd.Series(['abcdefg', np.nan]*500000)
>>> timeit s.str[:5]
1 loops, best of 3: 2.55 s per loop
>>> timeit s.map(lambda row: row[:5], na_action='ignore')
1 loops, best of 3: 558 ms per loop
```

Looking in the code the difference seems to be that Series.map with na_action='ignore' uses some vectorized code to filter out the NA values while Series.str uses the _na_map function with a try/except for each item in the Series (non-vectorized).

Can I make a request to eliminate the _na_map in favor of something more like Series.map(na_action='ignore')?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Series StringMethods very slow #2602

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Series StringMethods very slow #2602

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions