Skip to content

Possible bug when parsing from string to datetime when slicing #13929

Open
@tomchor

Description

@tomchor

From this question in SO

Please consider the following example

data = pd.DataFrame(np.random.random((72000,3)), columns=list('uvw'), 
           index=pd.date_range('2013-11-08 10:00:00', periods=72000, freq='50L'))
data.loc['2013-11-08 10:15:00.000':'2013-11-08 10:17:00.000']

This outputs

                                u         v         w
2013-11-08 10:15:00.000  0.569030  0.850393  0.600106
2013-11-08 10:15:00.050  0.679713  0.933720  0.041018
2013-11-08 10:15:00.100  0.503491  0.142397  0.841705
2013-11-08 10:15:00.150  0.171248  0.545567  0.247094
2013-11-08 10:15:00.200  0.149745  0.149588  0.935516
2013-11-08 10:15:00.250  0.039780  0.097837  0.087254
...                           ...       ...       ...
2013-11-08 10:17:00.700  0.001165  0.020971  0.197322
2013-11-08 10:17:00.750  0.003923  0.722930  0.312988
2013-11-08 10:17:00.800  0.941241  0.600529  0.479640
2013-11-08 10:17:00.850  0.272536  0.738084  0.486551
2013-11-08 10:17:00.900  0.060388  0.606207  0.359640
2013-11-08 10:17:00.950  0.464268  0.965543  0.699740

[2420 rows x 3 columns]

This is weird for me because I expected the last row to be 2013-11-08 10:17:00.000, since that's the end-point I defined. Indeed when I define the endpoint as datetime(2013,11,8,10,17,0,0), which should be identical, it works as I would expect:

In [13]: data.loc['2013-11-08 10:15:00.000':datetime(2013,11,8,10,17,0,0)]
Out[13]: 
                                u         v         w
2013-11-08 10:15:00.000  0.569030  0.850393  0.600106
2013-11-08 10:15:00.050  0.679713  0.933720  0.041018
2013-11-08 10:15:00.100  0.503491  0.142397  0.841705
2013-11-08 10:15:00.150  0.171248  0.545567  0.247094
2013-11-08 10:15:00.200  0.149745  0.149588  0.935516
2013-11-08 10:15:00.250  0.039780  0.097837  0.087254
...                           ...       ...       ...
2013-11-08 10:16:59.750  0.652168  0.606795  0.901583
2013-11-08 10:16:59.800  0.868184  0.249873  0.517637
2013-11-08 10:16:59.850  0.917543  0.303403  0.980257
2013-11-08 10:16:59.900  0.118191  0.032437  0.580734
2013-11-08 10:16:59.950  0.093644  0.017865  0.080326
2013-11-08 10:17:00.000  0.770234  0.310025  0.065127

[2401 rows x 3 columns]

I'm submitting this as a suggestion from an SO user because this seems like a bug.

To complete:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-21-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.7.0
Cython: None
numpy: 1.11.0
scipy: 0.17.0
statsmodels: None
IPython: 2.4.1
sphinx: 1.4.5
patsy: None
dateutil: 2.4.2
pytz: 2014.10
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions