Skip to content

DEPR: Remove NumericIndex from pandas/_testing/ #51098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Jan 31, 2023

Removes NumericIndex from pandas._testing.

A decision has to be made how assert_index_equal should interpret equivalence between an index and a RangeIndex. IMO two indexes should be equivalent if they both return true for one of the two tests below:

  1. isinstance(index, RangeIndex) is True, or
  2. (type(index) is Index and index.dtype == "int64") is True

The first is easy to agree on, but notice that the second means that an Index subclass with int64 dtype will not be equivalent to a RangeIndex. My thinking is that if the user subclasses Index, then subclass instances shouldn't be equivalent to RangeIndex, even if it has a int64 dtype.

Also, int32/int16/int8 indexes won't be equivalent to RangeIndex in this implementation.

@jbrockmendel
Copy link
Member

(type(index) is Index and index.dtype == "int64") is True

isn't there a keyword for this?

@mroeschke mroeschke added Deprecate Functionality to remove in pandas Index Related to the Index class or subclasses labels Feb 1, 2023
@mroeschke
Copy link
Member

mroeschke commented Feb 1, 2023

IMO I think int32/int16/int8 indexes should probably be considered equal to RangeIndex if the values appear equivalent, but probably better discussed & maybe addressed in a follow up

@topper-123
Copy link
Contributor Author

(type(index) is Index and index.dtype == "int64") is True

isn't there a keyword for this?

I don't think so. There is is_int64_dtype, but that only checks for it being array-like. Also (in my proposal), we check type(index) (not isinstance(index, Index)) and I don't think that check exists.

@jbrockmendel
Copy link
Member

In assert_index_equal we have 'exact' for allowing an Int64Index to be close enough to a RangeIndex. That can't be repurposed?

@topper-123
Copy link
Contributor Author

IMO I think int32/int16/int8 indexes should probably be considered equal to RangeIndex if the values appear equivalent, but probably better discussed & maybe addressed in a follow up

ATM we have:

>>> x1 = pd.Index([1, 2, 3], dtype="int8")
>>> x2 = pd.Index([1, 2, 3], dtype="int32")
>>> tm.assert_index_equal(x1, x2)
AssertionError: Index are different

Attribute "dtype" are different
[left]:  int32
[right]: int8

If your suggestion should be implemented, the above assertion should also be able to pass IMO, do we want that? Then what about Series with int8 and int32, should we have a mechanism where they can pass also?

@topper-123
Copy link
Contributor Author

In assert_index_equal we have 'exact' for allowing an Int64Index to be close enough to a RangeIndex. That can't be repurposed?

I'm not sure I understand you here, could you expand? This is about the functionality of assert_index_equal and the exact parameter, as I see it.

@jbrockmendel
Copy link
Member

without a keyword, assert_foo_equal should require dtypes to match

@topper-123
Copy link
Contributor Author

topper-123 commented Feb 1, 2023

A new keyword or an added option to exact? That should be for another PR IMO.

As an aside, I would like the default value for exact to be True, not "equiv". I think most of the time when testing, you want as precise testing by default as possible, and then you could relax the test as needed , by changing parameters.

@jbrockmendel
Copy link
Member

A new keyword or an added option to exact? That should be for another PR IMO.

Definitely not a new keyword. I was mostly just observing that the "exact" keyword was intended to allow case 2 from the OP to be equal-enough for assert_index_equal.

As an aside, I would like the default value for exact to be True, not "equiv". I think most of the time when testing, you want as precise testing by default as possible, and then you could relax the test as needed , by changing parameters.

+1

@mroeschke
Copy link
Member

Yeah I would expect

# Passes
tm.assert_index_equal(Index([1, 2, 3]], dtype=np.int64), RangeIndex(1, 4), exact="equiv")

# Passes in the future
tm.assert_index_equal(Index([1, 2, 3]], dtype=np.int16), RangeIndex(1, 4), exact="equiv")

# Raises
tm.assert_index_equal(Index([1, 2, 3]], dtype=np.int16), Index([1, 2, 3]], dtype=np.int32), exact="equiv")

@jbrockmendel
Copy link
Member

I (and I think topper) disagree with mroeschke on the Index[int16]-vs-RangeIndex case. I think we should be maximally-strict and have that raise.

(if down the line the default for 'exact' is changed from "equiv" to True, then id be open to allowing that to not-raise)

@topper-123
Copy link
Contributor Author

Yeah, I agree with @jbrockmendel on this.

1 similar comment
@topper-123
Copy link
Contributor Author

Yeah, I agree with @jbrockmendel on this.

@mroeschke
Copy link
Member

Okay that's fair; my opinion isn't super strong (FWIW I'm not super fond of having exact="equiv" in the first place)

@jbrockmendel
Copy link
Member

so are we good to merge?

@mroeschke mroeschke added this to the 2.0 milestone Feb 2, 2023
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mroeschke mroeschke merged commit 1acdf4b into pandas-dev:main Feb 2, 2023
@mroeschke
Copy link
Member

Thanks @topper-123

@topper-123 topper-123 deleted the remove_NumericIndex_from_pandas/_testing branch February 2, 2023 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants