Description
Hello, I personally feel there is a bit of mess in the way to select elements in a Series :-)
The general idea is that .iloc and .loc have consistent behaviour for respectively demanding a position-based or a index(label)-based value, but are a bit slower than .ix and using directly [] which behaviour is not always consistent.
But I found these methods a bit inconsistent, also in terms of what to return if the labels are not found or the required position are out or range in the looked-up Series.
I compiled the following tables, that summarises the behaviour of these 4 methods of lookup depending (a) if the Series to look-up has an integer or a string index (I do not consider for the moment the date index), (b) if the required data is a single element, a slice index or a list (yes, the behaviour change!) and (c) if the index is found or not in the data.
The following tables works with pandas 0.17.1, NumPy 1.10.4, Python 3.4.3.
Case 1: Series with Integer index
s = pd.Series(np.arange(100,105), index=np.arange(10,15))
s
10 100
11 101
12 102
13 103
14 104
** Single element ** ** Slice ** ** Tuple **
s[0] -> LAB -> KeyError s[0:2] -> POS -> {10:100, 11:101} s[[1,3]] -> LAB -> {1:NaN, 3:Nan}
s[13] -> LAB -> 103 s[10:12] -> POS -> empty Series s[[12,14]] -> LAB -> {12:102, 14:104}
--- --- ---
s.ix[0] -> LAB -> KeyError s.ix[0:2] -> LAB -> empty Series s.ix[[1,3]] -> LAB -> {1:NaN, 3:Nan}
s.ix[13] -> LAB -> 103 s.ix[10:12] -> LAB -> {10:100, 11:101, 12:102} s.ix[[12,14]] -> LAB -> {12:102, 14:104}
--- --- ---
s.iloc[0] -> POS -> 100 s.iloc[0:2] -> POS -> {10:100, 11:101} s.iloc[[1,3]] -> POS -> {11:101, 13:103}
s.iloc[13] -> POS -> IndexError s.iloc[10:12] -> POS -> empty Series s.iloc[[12,14]] -> POS -> IndexError
--- --- ---
s.loc[0] -> LAB -> KeyError s.loc[0:2] -> LAB -> empty Series s.loc[[1,3]] -> LAB -> KeyError
s.loc[13] -> LAB -> 103 s.loc[10:12] -> LAB -> {10:100, 11:101, 12:102} s.loc[[12,14]] -> LAB -> {12:102, 14:104}
Case 2: Series with string index
s = pd.Series(np.arange(100,105), index=['a','b','c','d','e'])
s
a 100
b 101
c 102
d 103
e 104
** Single element ** ** Slice ** ** Tuple **
s[0] -> POS -> 100 s[0:2] -> POS -> {'a':100,'b':101} s[[0,2]] -> POS -> {'a':100,'c':102}
s[10] -> LAB, POS -> KeyError, IndexError s[10:12] -> POS -> Empty Series s[[10,12]] -> POS -> IndexError
s['a'] -> LAB -> 100 s['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102} s[['a','c']] -> LAB -> {'a':100,'b':101, 'c':102}
s['g'] -> POS,LAB -> TypeError, KeyError s['f':'h'] -> LAB -> Empty Series s[['f','h']] -> LAB -> {'f':NaN, 'h':NaN}
--- --- ---
s.ix[0] -> POS -> 100 s.ix[0:2] -> POS -> {'a':100,'b':101} s.ix[[0,2]] -> POS -> {'a':100,'c':102}
s.ix[10] -> POS -> IndexError s.ix[10:12] -> POS -> Empty Series s.ix[[10,12]] -> POS -> IndexError
s.ix['a'] -> LAB -> 100 s.ix['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102} s.ix[['a','c']] -> LAB -> {'a':100,'b':101, 'c':102}
s.ix['g'] -> POS, LAB -> TypeError, KeyError s.ix['f':'h'] -> LAB -> Empty Series s.ix[['f','h']] -> LAB -> {'f':NaN, 'h':NaN}
--- --- ---
s.iloc[0] -> POS -> 100 s.iloc[0:2] -> POS -> {'a':100,'b':101} s.iloc[[0,2]] -> POS -> {'a':100,'c':102}
s.iloc[10] -> POS -> IndexError s.iloc[10:12] -> POS -> Empty Series s.iloc[[10,12]] -> POS -> IndexError
s.iloc['a'] -> LAB -> TypeError s.iloc['a':'c'] -> POS -> ValueError s.iloc[['a','c']] -> POS -> TypeError
s.iloc['g'] -> LAB -> TypeError s.iloc['f':'h'] -> POS -> ValueError s.iloc[['f','h']] -> POS -> TypeError
--- --- ---
s.loc[0] -> LAB -> KeyError s.loc[0:2] -> LAB -> TypeError s.loc[[0,2]] -> LAB -> KeyError
s.loc[10] -> LAB -> KeyError s.loc[10:12] -> LAB -> TypeError s.loc[[10,12]] -> LAB -> KeyError
s.loc['a'] -> LAB-> 100 s.loc['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102} s.loc[['a','c']] -> LAB -> {'a':100,'c':102}
s.loc['g'] -> LAB -> KeyError s.loc['f':'h'] -> LAB -> Empty Series s.loc[['f','h']] -> LAB -> KeyError
As you can see there are several inconsistencies, some of them even using .iloc and .loc.
-
The event of not founding the elements/indexing out of range is managed in three different ways: an exception is thrown, a null Series is returned or a Series with the demanded keys associated to NaN values is returned. For example s.loc['f':'h'] returns an Empty Series when s.loc[['f','h']] returns instead a KeyError. There should be a single way to handle missing elements, and eventually an optional parameter should say what to do when missing elements are encountered.
-
When using slicers, if the lookup is by position, the end element is excluded, but when the lookup is by label the final element is included!
-
.ix is redundant. There should be .iloc[] and .loc[] to have a guaranteed query by position and label respectively, and a faster way with a more complicated logic (but still well documented) when performance is a priority. s[] is just quicker to type than s.ix[], so for me the latter method is redundant.