Description
Code Sample, a copy-pastable example if possible
Setup:
In [2]: values1 = np.array([-0.17387645482451206, 0.3414148016424936])
...: values2 = np.array([-0.17387645482451206, 0.3414148016424937])
In [3]: df1 = pd.DataFrame({'a': values1, 'b': ['foo', 'bar']})
...: df2 = pd.DataFrame({'a': values2, 'b': ['foo', 'bar']})
By default, assert_frame_equal
will not fail on a difference in precision that's as slight as what's above, and will only detect the difference if check_exact=True
is passed:
In [4]: tm.assert_frame_equal(df1, df2)
In [5]: tm.assert_frame_equal(df1, df2, check_exact=True)
---------------------------------------------------------------------------
AssertionError: DataFrame.iloc[:, 0] are different
DataFrame.iloc[:, 0] values are different (50.0 %)
[left]: [-0.173876454825, 0.341414801642]
[right]: [-0.173876454825, 0.341414801642]
However, when extension arrays are introduced, which causes assert_frame_equal
to dispatch to assert_extension_array_equal
, this difference in precision is detected by default:
In [6]: tm.assert_frame_equal(df1.to_sparse(), df2.to_sparse())
---------------------------------------------------------------------------
AssertionError: numpy array are different
numpy array values are different (50.0 %)
[left]: [-0.17387645482451206, 0.3414148016424936]
[right]: [-0.17387645482451206, 0.3414148016424937]
Proposed Solution
Looking at the source code for assert_extension_array_equal
, it does not accept any of the keyword arguments that the assert_*_equal
functions take in regards to precision:
Lines 1192 to 1198 in e413c49
I'd like to add check_exact
, check_less_precise
, and check_dtype
parameters to assert_extension_array_equal
with the same defaults as the other assert_*_equal
functions.
Note that this would resolve #23605, which is the source of my example.
cc @TomAugspurger : Thoughts on this? Is there a reason we'd want check_exact
style checking by default?