Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
I'm not sure if this is a bug, a documentation issue, or just user error.
I found related issues #9750 #29048 that help me get the desired output but still don't explain why behavior doesn't match expectation based on my reading of the docs.
Code Sample
Following #9750 here's some code to reproduce the issue.
df = pd.DataFrame([[1, 1, 1],[2, 1, 1],[2, 1, 1],
[np.nan, np.nan, np.nan]], columns=["a","b","c"])
df.fillna(df.mode())
The resulting DataFrame is just df
itself, i.e. without any null values filled.
Problem description
I understand from the issues I mentioned (and numerous StackOverflow answers) that this is due to df.mode()
returning a DataFrame
, as opposed to a Series
, (which is why e.g. df.mean()
gives the expected results). To get the desired output df.fillna(df.mode().iloc[0])
works in this case.
But I'm confused by the docs which describe the value parameter:
value: scalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values
specifying which value to use for each index (for a Series) or column (for a
DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value
cannot be a list.
From how I read this, you should be able to pass a DataFrame with the same column names and have values filled for those columns.
Just in case the passed in DataFrame
needed to be the same shape, I tried
df2 = pd.DataFrame(np.zeros((4, 3)))
df.fillna(df2)
But still got df
back.
I read the source but the answer still eludes me.
Expected Output
The expected behavior is that the missing values will be filled with mode of each column.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : b5958ee
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.5
numpy : 1.19.2
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.0.0.post20201207
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None