Skip to content

BUG: numexpr 2.85 changed integer overflow handling, failing a test #54546

Open
@rebecca-palmer

Description

@rebecca-palmer

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

The test suite, specifically TestFrameFlexArithmetic.test_floordiv_axis0_numexpr_path[python-pow]

This does integer pow() where most of the inputs are multiples of 100 (e.g. 20100**100) and the mathematically correct result is hence a multiple of 2**100.  This is 0 mod 2**64, and plain pandas returns 0, but this example is a large enough array to use numexpr by default.

Issue Description

With numexpr 2.8.5 it instead returns -2**63, and hence the test fails.

=================================== FAILURES ===================================
483s _____ TestFrameFlexArithmetic.test_floordiv_axis0_numexpr_path[python-pow] _____
483s
483s self = <pandas.tests.frame.test_arithmetic.TestFrameFlexArithmetic object at 0x7fa71aebc1d0>
483s opname = 'pow'
483s
483s @pytest.mark.skipif(not NUMEXPR_INSTALLED, reason="numexpr not installed")
483s @pytest.mark.parametrize("opname", ["floordiv", "pow"])
483s def test_floordiv_axis0_numexpr_path(self, opname):
483s # case that goes through numexpr and has to fall back to masked_arith_op
483s op = getattr(operator, opname)
483s
483s arr = np.arange(_MIN_ELEMENTS + 100).reshape(_MIN_ELEMENTS // 100 + 1, -1) * 100
483s df = DataFrame(arr)
483s df["C"] = 1.0
483s
483s ser = df[0]
483s result = getattr(df, opname)(ser, axis=0)
483s
483s expected = DataFrame({col: op(df[col], ser) for col in df.columns})
483s > tm.assert_frame_equal(result, expected)
483s
483s /usr/lib/python3/dist-packages/pandas/tests/frame/test_arithmetic.py:510:
483s _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
483s
483s > ???
483s
483s pandas/_libs/testing.pyx:52:
483s _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
483s
483s > ???
483s E AssertionError: DataFrame.iloc[:, 0] (column name="0") are different
483s E
483s E DataFrame.iloc[:, 0] (column name="0") values are different (99.99 %)
483s E [index]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
483s E [left]: [1, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808, ...]
483s E [right]: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
483s

Expected Behavior

The test should pass.

I don't know whether pandas has documented integer overflow behaviour, but if it does it should follow it.

Installed Versions

Happens with numexpr 2.8.5 and not with 2.8.4. (This is not #54449, though I do also see that bug - that's an explicit exception, this is a changed answer.)

Seen in both pandas 1.5.3 and pandas 2.0.3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCompatpandas objects compatability with Numpy or Python functionsexpressionspd.eval, query

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions