@@ -1461,16 +1461,32 @@ Looking up values by index/column labels
1461
1461
1462
1462
Sometimes you want to extract a set of values given a sequence of row labels
1463
1463
and column labels, this can be achieved by ``pandas.factorize `` and NumPy indexing.
1464
- For instance:
1464
+
1465
+ For heterogeneous column types, we subset columns to avoid unnecessary numpy conversions:
1466
+
1467
+ .. ipython :: python
1468
+
1469
+ def pd_lookup_het (df , row_labels , col_labels ):
1470
+ rows = df.index.get_indexer(row_labels)
1471
+ cols = df.columns.get_indexer(col_labels)
1472
+ sub = df.take(np.unique(cols), axis = 1 )
1473
+ sub = sub.take(np.unique(rows), axis = 0 )
1474
+ rows = sub.index.get_indexer(row_labels)
1475
+ values = sub.melt()[" value" ]
1476
+ cols = sub.columns.get_indexer(col_labels)
1477
+ flat_index = rows + cols * len (sub)
1478
+ result = values[flat_index]
1479
+ return result
1480
+
1481
+ For homogeneous column types, it is fastest to skip column subsetting and go directly to numpy:
1465
1482
1466
1483
.. ipython :: python
1467
1484
1468
- df = pd.DataFrame({' col' : [" A" , " A" , " B" , " B" ],
1469
- ' A' : [80 , 23 , np.nan, 22 ],
1470
- ' B' : [80 , 55 , 76 , 67 ]})
1471
- df
1472
- idx, cols = pd.factorize(df[' col' ])
1473
- df.reindex(cols, axis = 1 ).to_numpy()[np.arange(len (df)), idx]
1485
+ def pd_lookup_hom (df , row_labels , col_labels ):
1486
+ rows = df.index.get_indexer(row_labels)
1487
+ cols = df.columns.get_indexer(col_labels)
1488
+ result = df.to_numpy()[rows, cols]
1489
+ return result
1474
1490
1475
1491
Formerly this could be achieved with the dedicated ``DataFrame.lookup `` method
1476
1492
which was deprecated in version 1.2.0 and removed in version 2.0.0.
0 commit comments