-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: pd.factorize should not upconvert unique values unnecessarily #41132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -143,11 +143,14 @@ def _ensure_data(values: ArrayLike) -> tuple[np.ndarray, DtypeObj]: | |
# until our algos support uint8 directly (see TODO) | ||
return np.asarray(values).astype("uint64"), np.dtype("bool") | ||
elif is_signed_integer_dtype(values): | ||
return ensure_int64(values), np.dtype("int64") | ||
dtype = getattr(values, "dtype", np.dtype("int64")) | ||
return ensure_int64(values), dtype | ||
elif is_unsigned_integer_dtype(values): | ||
return ensure_uint64(values), np.dtype("uint64") | ||
dtype = getattr(values, "dtype", np.dtype("uint64")) | ||
return ensure_uint64(values), dtype | ||
elif is_float_dtype(values): | ||
return ensure_float64(values), np.dtype("float64") | ||
dtype = getattr(values, "dtype", np.dtype("float64")) | ||
return ensure_float64(values), dtype | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think this code was written before we had non-64 hashtables. i expect a lot of this casting can now be avoided There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably yeah, and if we can avoid them, it'll speed some things up nicely. But that's for a different PR? I got a feeling there's a lot of detail hidden in that change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fair enough |
||
elif is_complex_dtype(values): | ||
|
||
# ignore the fact that we are casting to float | ||
|
Uh oh!
There was an error while loading. Please reload this page.