-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG/TST: Fix infer_dtype for Period array-likes and general ExtensionArrays #37367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
a3b7ad7
6aa4475
d6f6c0a
50d7425
ed36be2
ce64b64
0e15a9a
688104f
f5ea183
4c539e3
0504ad1
88dcd81
b53f46e
100b83b
fbb713d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1050,6 +1050,7 @@ _TYPE_MAP = { | |
"M": "datetime64", | ||
"timedelta64[ns]": "timedelta64", | ||
"m": "timedelta64", | ||
"period": "period", | ||
"interval": "interval", | ||
} | ||
|
||
|
@@ -1203,9 +1204,14 @@ cdef object _try_infer_map(object dtype): | |
object val | ||
str attr | ||
for attr in ["name", "kind", "base"]: | ||
val = getattr(dtype, attr) | ||
val = getattr(dtype, attr, None) | ||
if val in _TYPE_MAP: | ||
return _TYPE_MAP[val] | ||
# also check base name for parametrized dtypes (eg period[D]) | ||
if isinstance(val, str): | ||
val = val.split("[")[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we explicity add the Period[D] types to the type_map (there aren't that many) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Going from the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we were to get that specific, at some point it makes more sense to just return a dtype object There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right here why can't we try to convert to an EA type using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because we already have a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to import those in lib.pyx ? (might be fine, eg there are already imports from tslibs.period, but at the moment only If that's OK, I am fine with changing it. I don't necessarily find it less "hacky" than the current solution, but I just want some solution that is acceptable for all There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we not already import all of the scalar EA types? +1 on using the existing machinery There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
100% agree on both points
not ideal, but i dont think it will harm anything ATM There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
exactly you are missing the entire point of the dtype abstraction you avoid parsing strings in the first place i will be blocking this util / unless a good soln is done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
lib.pyx doesn't know anything about EAs. It only imports helper functions like
different than what? |
||
if val in _TYPE_MAP: | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return _TYPE_MAP[val] | ||
return None | ||
|
||
|
||
|
@@ -1324,12 +1330,12 @@ def infer_dtype(value: object, skipna: bool = True) -> str: | |
# e.g. categoricals | ||
dtype = value.dtype | ||
if not isinstance(dtype, np.dtype): | ||
value = _try_infer_map(value.dtype) | ||
if value is not None: | ||
return value | ||
inferred = _try_infer_map(value.dtype) | ||
if inferred is not None: | ||
return inferred | ||
|
||
# its ndarray-like but we can't handle | ||
raise ValueError(f"cannot infer type for {type(value)}") | ||
values = np.asarray(value) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we have tests that get here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The base extension test and the decimal test that I added in this PR get here (those were raising errors before) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note I am not fully sure about this change. Before it raised an error, now it will typically return "mixed" for random python objects. Both don't seem very useful (or in other words, both are an indication that nothing could be inferred). But I found it a bit inconsistent to raise in this case, while if you would pass a list of the same objects, we would actually infer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we are already doing this below. This does a full conversion of the object, I think this will simply kill performance in some cases making this routine very unpredictable. would rather not make this change here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. random python objects will be marked as 'mixed' in any event w/o the performance panelty below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you are going to remove the exception then you can remove this L1337,1338 entirely (as np.asarray is called on L1341) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I removed the duplicate |
||
|
||
# Unwrap Series/Index | ||
values = np.asarray(value) | ||
|
Uh oh!
There was an error while loading. Please reload this page.