Skip to content

Partial datetime indexing of Multiindex by year only #14049

Closed
@jesrael

Description

@jesrael

I have DataFrame:

import pandas as pd
import numpy as np

rng = pd.date_range('01-01-1988',periods=130000,freq='H')
long_df = pd.DataFrame(np.random.randn(130000,4),index = rng, columns=['bar','baz','foo','zoo'])
dfmi = long_df.stack().sort_index()

print (dfmi)
1988-01-01 00:00:00  bar   -1.129213
                     baz    1.405388
                     foo    0.482324
                     zoo   -2.315226
1988-01-01 01:00:00  bar   -0.533171
                     baz    1.906526
                     foo   -0.745095
                     zoo    0.351300
1988-01-01 02:00:00  bar   -0.162411
                     baz   -0.218296
                     foo    0.327074
                     zoo   -0.508463
1988-01-01 03:00:00  bar   -0.676780
                     baz   -1.137270
                     foo    0.389457
                     zoo   -0.517306
1988-01-01 04:00:00  bar    1.368110
                     baz    1.106048
                     foo    1.342662
                     zoo   -0.215235
1988-01-01 05:00:00  bar   -1.284003
                     baz    0.216028
                     foo    0.889421
                     zoo   -0.965881
1988-01-01 06:00:00  bar    1.522085
                     baz    0.436332
                     foo    0.114714
                     zoo    0.342456
1988-01-01 07:00:00  bar   -0.176803
                     baz   -0.365471

2002-10-30 08:00:00  foo   -1.196314
                     zoo    1.245414
2002-10-30 09:00:00  bar    1.271792
                     baz   -0.688911
                     foo   -0.852729
                     zoo    0.869926
2002-10-30 10:00:00  bar   -0.487386
                     baz   -0.161470
                     foo    0.593564
                     zoo   -0.440397
2002-10-30 11:00:00  bar    0.856882
                     baz   -0.297341
                     foo    2.106393
                     zoo    0.186903
2002-10-30 12:00:00  bar    1.551650
                     baz    1.152780
                     foo    0.726544
                     zoo   -0.334287
2002-10-30 13:00:00  bar    0.483035
                     baz    1.573808
                     foo   -0.734948
                     zoo    0.005149
2002-10-30 14:00:00  bar    0.077451
                     baz    0.783963
                     foo   -0.111543
                     zoo   -0.637871
2002-10-30 15:00:00  bar    0.383890
                     baz   -0.368546
                     foo    0.503162
                     zoo    0.324679
dtype: float64

print (dfmi.index.get_level_values(0))
DatetimeIndex(['1988-01-01 00:00:00', '1988-01-01 00:00:00',
               '1988-01-01 00:00:00', '1988-01-01 00:00:00',
               '1988-01-01 01:00:00', '1988-01-01 01:00:00',
               '1988-01-01 01:00:00', '1988-01-01 01:00:00',
               '1988-01-01 02:00:00', '1988-01-01 02:00:00',
               ...
               '2002-10-30 13:00:00', '2002-10-30 13:00:00',
               '2002-10-30 14:00:00', '2002-10-30 14:00:00',
               '2002-10-30 14:00:00', '2002-10-30 14:00:00',
               '2002-10-30 15:00:00', '2002-10-30 15:00:00',
               '2002-10-30 15:00:00', '2002-10-30 15:00:00'],
              dtype='datetime64[ns]', length=520000, freq='H')

If select by year, month and day, it works nice:

print (dfmi.loc['2001-01-01'])

2001-01-01 00:00:00  bar   -0.349633
                     baz   -1.945182
                     foo    0.422883
                     zoo   -1.183061
2001-01-01 01:00:00  bar   -0.014788
                     baz   -0.646235
                     foo   -1.449877
                     zoo    0.422516
2001-01-01 02:00:00  bar    0.197731
                     baz   -0.307682
                     foo   -1.293725
                     zoo    0.283953
2001-01-01 03:00:00  bar    0.725005
                     baz    1.182862
                     foo   -0.349092
                     zoo    1.354510
2001-01-01 04:00:00  bar   -0.516522
                     baz    0.582963
                     foo   -0.577522
                     zoo    0.710188
2001-01-01 05:00:00  bar   -0.453547
                     baz   -1.054144
                     foo    0.911428
                     zoo   -0.417231
2001-01-01 06:00:00  bar   -1.117797
                     baz   -0.315343
                     foo    1.441117
                     zoo    1.135960
2001-01-01 07:00:00  bar   -0.607946
                     baz    1.210047
                              ...   
2001-01-01 16:00:00  foo   -1.435443
                     zoo    0.045783
2001-01-01 17:00:00  bar   -0.590686
                     baz    1.115333
                     foo   -0.558769
                     zoo   -0.522037
2001-01-01 18:00:00  bar   -0.862442
                     baz    0.917343
                     foo    0.759948
                     zoo   -0.004601
2001-01-01 19:00:00  bar   -1.901064
                     baz    0.281404
                     foo   -0.704072
                     zoo   -0.104176
2001-01-01 20:00:00  bar   -0.703224
                     baz   -0.170848
                     foo    0.547919
                     zoo    1.199431
2001-01-01 21:00:00  bar    1.198095
                     baz   -0.416019
                     foo    1.551871
                     zoo   -0.472543
2001-01-01 22:00:00  bar   -0.239550
                     baz   -0.401000
                     foo    0.959729
                     zoo    0.783011
2001-01-01 23:00:00  bar    0.415624
                     baz    0.093273
                     foo    0.712913
                     zoo   -1.026016
dtype: float64

But if select by year or year and month, it return all DataFrame, there is not slicing.

print (dfmi.loc['2001-01'])

print (dfmi.loc['2001'])

Is it bug or not implemented yet? Thank you.

print (pd.show_versions())
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: sk_SK

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: 0.2.1
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeDuplicate ReportDuplicate issue or pull requestIndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions