Skip to content

BUG: Errors caused by DataFrame.all(..., skipna=False, ...) in rows without na values. #41079

Open
@carsten0202

Description

@carsten0202
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


I'm familiar with several issues being reported relating to skipna=False, but I have not seen this particular problem elsewhere. In short, the 'skipna=False' setting causes errors in rows not having any missing values.

Code Sample, a copy-pastable example

# Minimal code to reproduce in pandas 1.2.4:
import pandas as pd
df1 = pd.DataFrame({"A": [False,pd.NA,pd.NA], "B": [True,True,False]})
df1
       A      B
0  False   True
1   <NA>   True
2   <NA>  False

df1.all(axis=1, skipna=False)
0     True
1     True
2    False
dtype: bool

# And again with experimental BooleanDtype
df2 = pd.DataFrame({"A": [False,pd.NA,pd.NA], "B": [True,True,False]}, dtype=pd.BooleanDtype())
df2
       A      B
0  False   True
1   <NA>   True
2   <NA>  False

df2.all(axis=1, skipna=False)
0    True
1    True
2    True
dtype: bool

Problem description

With df1 row '0' is clearly wrong; with df2 '0' and '2' are undeniably wrong. The rest are imho also suspicious, but they at least follow the documented behavior of:

If skipna is False, then NA are treated as True, because these are not equal to zero.

...even if I do believe the mathematically correct answers are as given e.g. here:
https://www.ibm.com/docs/en/spss-statistics/SaaS?topic=command-missing-values-logical-operators-if
In other words, row '1' should, imho, evaluate to 'NA'.

Expected Output

Rows '0' and '2' should clearly evaluate to 'False'.
Row '1' should maybe evaluate to 'NA'.

INSTALLED VERSIONS

commit : 2cb9652
python : 3.9.2.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-957.12.2.el7.x86_64
Version : #1 SMP Fri Apr 19 21:09:07 UTC 2019
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : C
LANG : C
LOCALE : None.None

pandas : 1.2.4
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 49.6.0.post20210108
Cython : None
pytest : 6.2.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : None
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions