BUG: Index and MultiIndex `KeyError` cases and discussion

Since the introduction of `KeyError` for missing keys in an `index` there have been quite a few use cases from different issues. I will try and link some of the issues if I see them.

My view is that `KeyErrors` for `Index` is fine, but `MultiIndexes` should be treated differently: you cannot always raise a `KeyError` for a single keys in a `MultiIndex` slice since a `MultiIndex` **cannot** always be reindexed.

## Index

```
indexes = [
    pd.Index(['a','b','c','e','d'], name='Unique Non-Monotonic'),
    pd.Index(['a','b','c','e','d'], name='Unique Monotonic').sort_values(),
    pd.Index(['a','b','b','e','d'], name='Non-Unique Non-Monotonic'),
    pd.Index(['a','b','b','e','d'], name='Non-Unique Monotonic').sort_values(),
]
```
![Screen Shot 2021-02-12 at 13 23 16](https://user-images.githubusercontent.com/24256554/107767626-89f28a00-6d35-11eb-9ce0-ac9705902d23.png)


<details>
  <summary>Code generator</summary>

  ```
  ret = None
  def do(command):
      try:
          exec(f'global ret; ret={command}', globals())
      except KeyError:
          return 'KeyError'
      else:
          if isinstance(ret, (np.int64)):
              return 'int64'
          elif isinstance(ret, (pd.Series)):
              return 'Series'
          elif isinstance(ret, (pd.DataFrame)):
              return 'DataFrame'
          return 'OtherType'
  
  cases = [
  "'a'",           # single valid key
  "'!'",           # single invalid key 
  "['a']",         # single valid key as pseudo multiple valid keys
  "['!']",         # single invalid key as pseudo multip valid keys
  "['a','e']",     # multiple valid keys
  "['a','!']",     # at least one invalid keys  
  "'a':'e'",       # valid key slice
  "'a':'!'",       # at least one invalid slice key
  "'!':",          # at least one invalid slice key
  "'b'",           # single valid non-unique key
  "['b']",         # single valid non-unique key as pseudo multiple keys
  "'b':'d'",       # slice with non-unique key
  ]
  
  base = [
      [f's.loc[{case}]',        # use regular s.loc[]
       f's.loc[ix[{case}]]']    # and with index slice as comparison s.loc[ix[{}]]
      for case in cases
  ]
  commands = [
      command for sublist in base for command in sublist
  ]
  
  indexes = [
      pd.Index(['a','b','c','e','d'], name='Unique Non-Monotonic'),
      pd.Index(['a','b','c','e','d'], name='Unique Monotonic').sort_values(),
      pd.Index(['a','b','b','e','d'], name='Non-Unique Non-Monotonic'),
      pd.Index(['a','b','b','e','d'], name='Non-Unique Monotonic').sort_values(),
  ]
  
  results = pd.DataFrame('', index=commands, columns=['Uq Non-Mono', 'Uq Mono', 'Non-Uq Non-Mono', 'Non-Uq Mono'])
  for j, index in enumerate(indexes):
      s = pd.Series([1,2,3,4,5], index=index)
      for i, command in enumerate(commands):
          results.iloc[i, j] = do(command)
  ```

</details>

This seems to be pretty consistent. The only inconsistency is perhaps highlighted in red, and a minor niggle for dynamic coding might be the different return types in the case of non-unique indexes.

Obviously the solution to dealing with any case where you need to index by pre-defined levels that may have been filtered is to `reindex` with your pre-defined keys. Any this is quite easy to do in RAM.

## MultiIndex

```
indexes = [
    pd.MultiIndex.from_tuples([('b','x'), ('b', 'y'), ('b', 'z'), ('a','x'), ('a', 'z')]),
    pd.MultiIndex.from_tuples([('b','x'), ('b', 'y'), ('b', 'z'), ('a','x'), ('a', 'z')]).sortlevel()[0],
    pd.MultiIndex.from_tuples([('b','x'), ('b', 'y'), ('b', 'x'), ('a','x'), ('a', 'z')]),
    pd.MultiIndex.from_tuples([('b','x'), ('b', 'y'), ('b', 'x'), ('a','x'), ('a', 'z')]).sortlevel()[0],
]
```

MultiIndexing is different. You **cannot** always reindex for one of two reasons:

- The number of possible combinations of the index level values exceeds ram and is computationally slow.
- If you are to add in a value or set of values to a MultiIndex level the process is ambiguous and expanding all combinations will lead to above problems.

For example, consider the MultiIndex levels: (a,b), (x,y,z). There are a maximum of 6 index tuples but practically one will work with indexes of much less than the maximum combinations *(Since the combinations scale exponentially with the number of levels)*. Your MultiIndex is thus [(a,x), (a,z), (b,x), (b,y)]. 

I think **you need** to be able to index MultiIndexes with keys that are missing. As a rule I would suggest that slices which are an iterable do not yield KeyErrors. Here is a summary of some of the observances below for current behaviour:

```
[a, y] : KeyError
[a, [y]] : KeyError but should return empty (a in level0)
[[a], y] : KeyError but should return empty (y in level1)
[[a], [y]] : KeyError but should return empty 
[a, !] : KeyError 
[a, [!]] : returns empty
[[a], !] : KeyError (maybe OK since ! not in level1)
[[!], x] : returns empty (x in level1)
[[!], [!]] : returns empty
[!, !] : KeyError
```

![multiindex_slice](https://user-images.githubusercontent.com/24256554/107767091-c07bd500-6d34-11eb-891d-d2c5e4903499.png)

<details>
  <summary>Code generator</summary>

  ```
  cases_level0 = [
    "'a'",         # single valid key on level0
    "'!'",         # single invalid key on level0
    "['a']",       # single valid key on level0 as pseudo multiple valid keys
    "['!']",       # single invalid key on level0 as pseudo multiple valid keys
    "['a', 'b']",  # multiple valid key on level0
    "['a', '!']",  # at least one invalid key on level0
    "'a':'b'",     # valid level0 index slice
    "'a':'!'",     # invalid level0 index slice
    "'!':",        # fully invalid level0 index slice
]

comments_level0 = [
'0: valid single, ',
'0: invalid single, ',
'0: valid single as multiple, ',
'0: invalid single as multiple, ',
'0: multiple valid, ',
'0: one invalid in multiple, ',
'0: valid slice, ',
'0: semi-invalid slice, ',
'0: invalid slice, ',
]

base = [
    [f's.loc[{case}]', f's.loc[ix[{case}, :]]']  for case in cases_level0
]
commands = [
    command for sublist in base for command in sublist
]

indexes = [
    pd.MultiIndex.from_tuples([('b','x'), ('b', 'y'), ('b', 'z'), ('a','x'), ('a', 'z')]),
    pd.MultiIndex.from_tuples([('b','x'), ('b', 'y'), ('b', 'z'), ('a','x'), ('a', 'z')]).sortlevel()[0],
    pd.MultiIndex.from_tuples([('b','x'), ('b', 'y'), ('b', 'x'), ('a','x'), ('a', 'z')]),
    pd.MultiIndex.from_tuples([('b','x'), ('b', 'y'), ('b', 'x'), ('a','x'), ('a', 'z')]).sortlevel()[0],
]

results = pd.DataFrame('', index=commands, columns=['Uq Non-Mono', 'Uq Mono', 'Non-Uq Non-Mono', 'Non-Uq Mono'])
for j, index in enumerate(indexes):
    s = pd.Series([1,2,3,4,5], index=index)
    for i, command in enumerate(commands):
        results.iloc[i,j] = do(command)

base = [
    [com, com]  for com in comments_level0
]
comments = [
    com for sublist in base for com in sublist
]
        
results['Comment'] = comments    
results.style

cases_level1 = [
    "'x'",         # single valid key on level1
    "'y'",         # single sometimes-valid key on level1
    "'!'",         # single invalid key on level1
    "['x']",       # single valid key on level1 as pseudo multiple valid keys
    "['y']",       # single sometimes-valid key on level1 as pseudo multiple valid keys
    "['!']",       # single invalid key on level1 as pseudo multiple valid keys
    "['x', 'y']",  # multiple sometimes-valid key on level1
    "['x', '!']",  # at least one invalid key on level1
    "'x':'y'",     # sometimes-valid level0 index slice
    "'x':'!'",     # invalid level1 index slice
    "'!':",        # fully invalid level1 index slice
]

comments_level1 = [
 '1: valid single',
 '1: semi-valid single',
 '1: invalid single',
 '1: valid single as multiple',
 '1: semi-valid single as multiple',
 '1: invalid single as multiple',
 '1: multiple semi-valid',
 '1: one invalid in multiple',   
 '1: semi-valid slice',
 '1: semi-invalid slice',
 '1: invalid slice',  
]

from itertools import product
multi_cases = list(product(cases_level0, cases_level1))
multi_comments = list(product(comments_level0, comments_level1))

base = [
    [f's.loc[{case[0]}, {case[1]}]', f's.loc[ix[{case[0]}, {case[1]}]]']  for case in multi_cases
]
commands = [
    command for sublist in base for command in sublist
]

results = pd.DataFrame('', index=commands, columns=['Uq Non-Mono', 'Uq Mono', 'Non-Uq Non-Mono', 'Non-Uq Mono'])
for j, index in enumerate(indexes):
    s = pd.Series([1,2,3,4,5], index=index)
    for i, command in enumerate(commands):
        results.iloc[i,j] = do(command)

base = [
    [com, com]  for com in multi_comments
]
comments = [
    com for sublist in base for com in sublist
]        

results['comment'] = comments
results.style\
       .applymap(lambda v: 'background-color:red;', subset=ix[["s.loc['a', ['!']]", "s.loc['a', ['y']]", "s.loc[['!'], ['!']]", "s.loc[['a'], ['y']]"], :])\
       .applymap(lambda v: 'background-color:LemonChiffon;', subset=ix[["s.loc['a', 'x':'y']"], :])

  ```

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Index and MultiIndex `KeyError` cases and discussion #39775

Index

MultiIndex

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: Index and MultiIndex KeyError cases and discussion #39775

Description

Index

MultiIndex

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

BUG: Index and MultiIndex `KeyError` cases and discussion #39775