Closed
Description
See this very strange behavior when taking the mean of a dataframe that has a series with uint16 datatype. Why does the mean of column y change after including column x?
Python 2.7.7 |Anaconda 2.0.1 (x86_64)| (default, Jun 2 2014, 12:48:16)
Type "copyright", "credits" or "license" for more information.
IPython 2.1.0 -- An enhanced Interactive Python.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: import random
In [4]:
In [4]: np.__version__
Out[4]: '1.8.1'
In [5]: pd.__version__
Out[5]: '0.14.0'
In [6]:
In [6]: y = np.array([random.randint(1900,2000) for x in range(0,2000)])
In [7]: y.mean()
Out[7]: 1950.0115000000001
In [8]: y.astype(np.uint16).mean()
Out[8]: 1950.0115000000001
In [9]:
In [9]: d1 = pd.DataFrame()
In [10]: d1['y'] = y.astype(np.uint16)
In [11]: d1.mean()
Out[11]:
y 16.6995
dtype: float64
In [12]:
In [12]: d1['x'] = y.astype(np.int16)
In [13]: d1.mean()
Out[13]:
y 1950.0115
x 1950.0115
dtype: float64