Skip to content

Taking mean on dataframe with uint16 gives wrong number #7976

Closed
@kyleco

Description

@kyleco

See this very strange behavior when taking the mean of a dataframe that has a series with uint16 datatype. Why does the mean of column y change after including column x?

Python 2.7.7 |Anaconda 2.0.1 (x86_64)| (default, Jun  2 2014, 12:48:16)
Type "copyright", "credits" or "license" for more information.

IPython 2.1.0 -- An enhanced Interactive Python.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: import random

In [4]:

In [4]: np.__version__
Out[4]: '1.8.1'

In [5]: pd.__version__
Out[5]: '0.14.0'

In [6]:

In [6]: y = np.array([random.randint(1900,2000) for x in range(0,2000)])

In [7]: y.mean()
Out[7]: 1950.0115000000001

In [8]: y.astype(np.uint16).mean()
Out[8]: 1950.0115000000001

In [9]:

In [9]: d1 = pd.DataFrame()

In [10]: d1['y'] = y.astype(np.uint16)

In [11]: d1.mean()
Out[11]:
y    16.6995
dtype: float64

In [12]:

In [12]: d1['x'] = y.astype(np.int16)

In [13]: d1.mean()
Out[13]:
y    1950.0115
x    1950.0115
dtype: float64

Metadata

Metadata

Assignees

No one assigned

    Labels

    Numeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions