Skip to content

Object dtype for empty describe #27184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 3, 2019
Merged

Conversation

TomAugspurger
Copy link
Contributor

Closes #27183

@TomAugspurger TomAugspurger added this to the 0.25.0 milestone Jul 2, 2019
@TomAugspurger TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Jul 2, 2019
@@ -595,7 +606,8 @@ def test_describe_empty_categorical_column(self):
df = pd.DataFrame({"empty_col": Categorical([])})
result = df.describe()
expected = DataFrame({'empty_col': [0, 0, None, None]},
index=['count', 'unique', 'top', 'freq'])
index=['count', 'unique', 'top', 'freq'],
dtype='object')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added in 0.25.0. Returning object dtype isn't an (additional) API change from 0.24

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be np.nan? (and not None); I am not sure we actually check this in the comparator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't clear to me whether I should use None or NaN. Arguments for NaN instead of None

  1. it's our most consistent missing value marker
  2. We often translate None to NaN implicitly

but I don't know what's best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right I think we should np.nan; however I don't remember if we actually test this in assert_series_equal; maybe the check_exact arg does this but I don't remember.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

In [4]: a = pd.Series([None], dtype=object)

In [5]: b = pd.Series([np.nan], dtype=object)

In [6]: pd.util.testing.assert_series_equal(a, b)

# https://github.com/pandas-dev/pandas/issues/27183
s = pd.Series([None, None], dtype=object)
result = s.describe()
expected = pd.Series([0, 0, np.nan, np.nan], dtype=object,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you use np.nan here

@jreback jreback merged commit 212df86 into pandas-dev:master Jul 3, 2019
@jreback
Copy link
Contributor

jreback commented Jul 3, 2019

thanks @TomAugspurger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.describe for all-empty should be object dtype
2 participants