Skip to content

[26660] compare int with double by subtracting during describe percen… #26768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ Other Enhancements
- :func:`merge_asof` now gives a more clear error message when merge keys are categoricals that are not equal (:issue:`26136`)
- :meth:`pandas.core.window.Rolling` supports exponential (or Poisson) window type (:issue:`21303`)
- :class:`DatetimeIndex` and :class:`TimedeltaIndex` now have a `mean` method (:issue:`24757`)
-
- :meth:`DataFrame.describe` now formats integer percentiles without decimal point (:issue:`26660`)

.. _whatsnew_0250.api_breaking:

Expand Down
2 changes: 1 addition & 1 deletion pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -1246,7 +1246,7 @@ def format_percentiles(percentiles):
raise ValueError("percentiles should all be in the interval [0,1]")

percentiles = 100 * percentiles
int_idx = (percentiles.astype(int) == percentiles)
int_idx = (abs(percentiles.astype(int) - percentiles) < 1e-4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for using 1e-4? May make more sense just to leverage np.allclose

Copy link
Contributor Author

@jkovacevic jkovacevic Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good insight - it makes code more readable. Instead of np.allclose, I've opted out for np.isclose, which returns array of booleans. Reasoning for that is that further down in the function that mask is used when determining how to format integers (where mask is True) and decimal numbers (where mask is False).

Another approach would be that if at least one percentile is decimal, all percentiles should be represented as decimals. This is currently not the case, for instance if percentiles to describe are: [0, 0.5, 0.334] the method will return ["0%", "50%", "33.4%"]. This approach would yield: ["0.0%", "50.0%", "33.4%"].

I wanted to change as little as possible, so I returned mask of booleans from this expression, but if it fits your business logic to change to second approach, I can modify the code to that behavior also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This is fine as is - really just wanted to circumvent hard coding thresholds like before. Thanks for reviewing in detail!


if np.all(int_idx):
out = percentiles.astype(int).astype(str)
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/io/formats/test_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -2740,6 +2740,14 @@ def test_format_percentiles():
fmt.format_percentiles([0.1, 0.5, 'a'])


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an additional test that uses .describe, like you did in the OP: #26660 (comment), add in pandas/tests/frame/test_analytics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the test you mentioned. Please comment if it is not sufficient.

def test_format_percentiles_integer_idx():
# Issue #26660
result = fmt.format_percentiles(np.linspace(0, 1, 10 + 1))
expected = ['0%', '10%', '20%', '30%', '40%', '50%',
'60%', '70%', '80%', '90%', '100%']
assert result == expected


def test_repr_html_ipython_config(ip):
code = textwrap.dedent("""\
import pandas as pd
Expand Down