Pandas attempts to convert some strings to timestamps when grouping by a timestamp and aggregating?

I am working through logs of web requests, and when I want to find the most common, say, user agent string for a (disguised) user, I run something like the following:

```
from pandas import Series, DataFrame, Timestamp

tdf = DataFrame({'day': {0: Timestamp('2015-02-24 00:00:00'),  1: Timestamp('2015-02-24 00:00:00'),
                                      2: Timestamp('2015-02-24 00:00:00'), 3: Timestamp('2015-02-24 00:00:00'),
                                      4: Timestamp('2015-02-24 00:00:00')},
                            'userAgent': {0: 'some UA string', 1: 'some UA string', 2: 'some UA string',
                                                 3: 'another UA string', 4: 'some UA string'},
                             'userId': {0: '17661101',  1: '17661101', 2: '17661101', 3: '17661101', 4: '17661101'}})

def most_common_values(df):
    return Series({c: s.value_counts().index[0] for c,s in df.iteritems()})

tdf.groupby('day').apply(most_common_values)
```

Note that in this (admittedly unusual) example, all of the lines are identical. I'm not sure if that is necessary to recreate the issue. And, I'm obscuring the exact purpose of this code, but it reproduces the bug: The 'userId' comes back as a Timestamp, not a string. This happens after the function most_common_values returns, since that userId string is not returned as a timestamp. if we change the value of the userId to an int:

```
tdf['userId'] = tdf.userId.astype(int)
```

or if the value of the associated integer  is small enough:

```
tdf['userId'] = '15320104`
```

then the results are what we'd expect (the most common value as its original type is returned.)

I imagine that for some reason something like a dateutil parser is being called on strings by default but that probably shoulnd't be happening...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pandas attempts to convert some strings to timestamps when grouping by a timestamp and aggregating? #10078

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Pandas attempts to convert some strings to timestamps when grouping by a timestamp and aggregating? #10078

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions