-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
[26660] compare int with double by subtracting during describe percen… #26768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
92fc8ca
1064c12
f97bf76
2d8d6ac
9fd212e
5a73741
1d5d48a
293da55
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1246,7 +1246,7 @@ def format_percentiles(percentiles): | |
raise ValueError("percentiles should all be in the interval [0,1]") | ||
|
||
percentiles = 100 * percentiles | ||
int_idx = (percentiles.astype(int) == percentiles) | ||
int_idx = (abs(percentiles.astype(int) - percentiles) < 1e-4) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the reasoning for using 1e-4? May make more sense just to leverage np.allclose There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good insight - it makes code more readable. Instead of np.allclose, I've opted out for np.isclose, which returns array of booleans. Reasoning for that is that further down in the function that mask is used when determining how to format integers (where mask is True) and decimal numbers (where mask is False). Another approach would be that if at least one percentile is decimal, all percentiles should be represented as decimals. This is currently not the case, for instance if percentiles to describe are: [0, 0.5, 0.334] the method will return ["0%", "50%", "33.4%"]. This approach would yield: ["0.0%", "50.0%", "33.4%"]. I wanted to change as little as possible, so I returned mask of booleans from this expression, but if it fits your business logic to change to second approach, I can modify the code to that behavior also. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice. This is fine as is - really just wanted to circumvent hard coding thresholds like before. Thanks for reviewing in detail! |
||
|
||
if np.all(int_idx): | ||
out = percentiles.astype(int).astype(str) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2740,6 +2740,14 @@ def test_format_percentiles(): | |
fmt.format_percentiles([0.1, 0.5, 'a']) | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add an additional test that uses .describe, like you did in the OP: #26660 (comment), add in pandas/tests/frame/test_analytics There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added the test you mentioned. Please comment if it is not sufficient. |
||
def test_format_percentiles_integer_idx(): | ||
# Issue #26660 | ||
result = fmt.format_percentiles(np.linspace(0, 1, 10 + 1)) | ||
gfyoung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
expected = ['0%', '10%', '20%', '30%', '40%', '50%', | ||
'60%', '70%', '80%', '90%', '100%'] | ||
assert result == expected | ||
|
||
|
||
def test_repr_html_ipython_config(ip): | ||
code = textwrap.dedent("""\ | ||
import pandas as pd | ||
|
Uh oh!
There was an error while loading. Please reload this page.