-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fix integral truediv and floordiv for pyarrow types with large divisor and avoid floating points for floordiv #56677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 16 commits
aad3b2e
599420b
2f43c42
579d4ca
e826790
bfcff0b
cd7c4c7
692dde4
fbae188
0b3f3c8
2024f4c
31a8e19
f7095ba
80a6976
c992e77
e56b1b3
63bf5d8
167507e
263b8a2
c77c3f7
e36fb90
47c4474
ae2afa3
9913c42
51bd7f3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -127,12 +127,34 @@ def floordiv_compat( | |
left: pa.ChunkedArray | pa.Array | pa.Scalar, | ||
right: pa.ChunkedArray | pa.Array | pa.Scalar, | ||
) -> pa.ChunkedArray: | ||
# Ensure int // int -> int mirroring Python/Numpy behavior | ||
# as pc.floor(pc.divide_checked(int, int)) -> float | ||
converted_left = cast_for_truediv(left, right) | ||
result = pc.floor(pc.divide(converted_left, right)) | ||
if pa.types.is_integer(left.type) and pa.types.is_integer(right.type): | ||
divided = pc.divide(left, right) | ||
if pa.types.is_integer(divided.type): | ||
# GH 56676: avoid storing intermediate calculating in floating point type. | ||
has_remainder = pc.not_equal(pc.multiply(divided, right), left) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO floordiv would be a useful compute kernel in arrow, especially since python and numpy both support it. I opened apache/arrow#39386, if added would make this code much simpler (and probably perform better). |
||
result = pc.if_else( | ||
# Pass a typed arrow scalar rather than stdlib int | ||
# which always inferred as int64, to prevent overflow | ||
# in case of large uint64 values. | ||
pc.and_( | ||
pc.less( | ||
pc.bit_wise_xor(left, right), pa.scalar(0, type=divided.type) | ||
rohanjain101 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
), | ||
has_remainder, | ||
rohanjain101 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
), | ||
# GH 55561: floordiv should round towards negative infinity. | ||
# pv.divide for integral types rounds towards 0. | ||
# Avoid using subtract_checked which would incorrectly raise | ||
# for -9223372036854775808 // 1, because if integer overflow | ||
# occurs, then has_remainder should be false, and overflowed | ||
# value is discarded. | ||
pc.subtract(divided, pa.scalar(1, type=divided.type)), | ||
divided, | ||
) | ||
# Ensure compatibility with older versions of pandas where | ||
# int8 // int64 returned int8 rather than int64. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this something that can be changed? If so, the cast to left.type can be removed, result.type is guaranteed to already be integral. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually don't think this old behavior is desirable, for example:
With this cast, this operation fails with overflow error, because 128 can't fit in an int8. In numpy, it looks like this operation promotes to common type of int64:
So I think this cast should be removed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like there's comments in the test about ArrowExtensionArray not promoting, so maybe this is the intended behavior? Restored the cast. |
||
result = result.cast(left.type) | ||
else: | ||
result = pc.floor(divided) | ||
return result | ||
|
||
ARROW_ARITHMETIC_FUNCS = { | ||
|
Uh oh!
There was an error while loading. Please reload this page.