Skip to content

Inconsistency, NaT included in result of groupby method first but not NaN #10590

Closed
@larvian

Description

@larvian

NaT is included in result of groupby method first while NaN. I am expecting that first should skip both NaN and NaT and include the first value where pandas.isnull is False.
Demonstration of the inconsistency. (note that both NaT and NaN in the data frame are produced by np.nan, the difference is that the d_t column contains date values).

import numpy as np
import pandas as pd
from datetime import datetime as dt

testFrame=DataFrame({'IX':['A','A'],'num':[np.nan,100],'d_t':[np.nan,dt.now()]})

Resulting data frame:

  IX                     d_t  num
0  A                     NaT  NaN
1  A 2015-07-15 22:47:10.635  100

Grouping this data frame on the IX column and executing the first method results in this data frame which shows the inconsistency between the d_t and num columns.

testFrame.groupby('IX').first()

Resulting dataframe:

        d_t  num
IX              
A       NaT  100
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.2
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.6.7
lxml: 3.4.2
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions