Description
Code Sample, a copy-pastable example
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(["a", "b", np.nan], index=["a", "b", np.nan], dtype="category")
>>> df = pd.DataFrame().assign(
... cat_contains=s.str.contains("a", na=False),
... cat_startswith=s.str.startswith("a", na=False),
... cat_endswith=s.str.endswith("a", na=False),
... str_contains=s.astype("string").str.contains("a", na=False),
... str_startswith=s.astype("string").str.startswith("a", na=False),
... str_endswith=s.astype("string").str.endswith("a", na=False),
... )
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, a to nan
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 cat_contains 3 non-null bool
1 cat_startswith 2 non-null object
2 cat_endswith 2 non-null object
3 str_contains 3 non-null boolean
4 str_startswith 3 non-null boolean
5 str_endswith 3 non-null boolean
dtypes: bool(1), boolean(3), object(2)
memory usage: 93.0+ bytes
>>> df
cat_contains cat_startswith cat_endswith str_contains str_startswith str_endswith
a True True True True True True
b False False False False False False
NaN False NaN NaN False False False
Problem description
.str.startswith(..., na=False)
and .str.endswith
should make missing values False
when the calling series is of type categorical just like it does for string series.
Similar to #22158, but .str.contains
works here.
Expected Output
>>> df
cat_contains cat_startswith cat_endswith str_contains str_startswith str_endswith
a True True True True True True
b False False False False False False
NaN False False False False False False
Output of pd.show_versions()
Using conda env with conda create -n pandas112 -c conda-forge pandas=1.1.2
INSTALLED VERSIONS
commit : 2a7d332
python : 3.8.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Korean_Korea.949
pandas : 1.1.2
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 49.6.0.post20200814
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None