Closed
Description
NaT
is included in result of groupby
method first
while NaN
. I am expecting that first should skip both NaN
and NaT
and include the first value where pandas.isnull
is False.
Demonstration of the inconsistency. (note that both NaT
and NaN
in the data frame are produced by np.nan
, the difference is that the d_t column contains date values).
import numpy as np
import pandas as pd
from datetime import datetime as dt
testFrame=DataFrame({'IX':['A','A'],'num':[np.nan,100],'d_t':[np.nan,dt.now()]})
Resulting data frame:
IX d_t num
0 A NaT NaN
1 A 2015-07-15 22:47:10.635 100
Grouping this data frame on the IX
column and executing the first
method results in this data frame which shows the inconsistency between the d_t
and num
columns.
testFrame.groupby('IX').first()
Resulting dataframe:
d_t num
IX
A NaT 100
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.16.2
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.6.7
lxml: 3.4.2
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None