Open
Description
Hey is this something you'd like to add to the library ?
Rationale
Bookeeping which examples have been labeled is no fun and error prone and verbose. I propose a "Manager" class that will do it for us, so that labeling looks like this:
User Experience
for index in range(10):
query_index, query_instance = learner.query(manager.unlabeld)
ix = query_index[0]
print(manager.remaining_sources[ix])
label = (ix,int(input()))
manager.add_labels(label)
In a notebook it works like this:
Idea
The idea is that the manager maintains masks for the labeled and unlabeled indices, and calculates offsets when adding new labels. End user shouldnt care about this stuff, and they can get the unlabeled examples from manager.unlabeled which is a view on the array.
This is some mock code, I haven't tested it properly yet
import typing
from typing import List,Tuple,Any,Union
import numpy as np
Label = Tuple[int, Any]
LabelList = List[Label]
Sources =List[Any]
class ALManager():
def __init__(self, features:np.ndarray, labels:LabelList=None, sources:Sources=None):
'''
:param features: An array of the features that will be used for AL.
:param labels: Any prexesiting labels. Each label is a tuple(idx,label)
:param source: A list of the original data
'''
if labels is None:
labels = []
self.features = features
if labels is None:
self.labels = []
self.labels = labels
self.labeled_mask = np.zeros(self.features.shape[0],dtype=bool)
self.unlabeled_mask = np.ones(self.features.shape[0],dtype=bool)
self._update_masks(self.labels)
self.sources = np.array(sources if sources else [])
def _update_masks(self,labels:Union[LabelList,Label]):
for label in labels:
self.labeled_mask[label[0]] = True
self.unlabeled_mask[label[0]] = False
def _offset_new_labes(self,labels:LabelList):
if len(self.labels)==0:
# Nothing to correct in this case
return labels
correctLabels: LabelList = []
labeledIndices =self.labeled_mask.nonzero()[0]
for label in labels:
#The argmax trick on a bool condition returns the index of the first labeled item which came before the new one
newLabel :Label = (np.argmax(labeledIndices<=label[0])+1+label[0],label[1])
correctLabels.append(newLabel)
return correctLabels
def add_labels(self, labels:LabelList):
if isinstance(labels,tuple): # if this is a single example
labels :LabelList = [labels]
elif isinstance(labels,list):
pass
else:
raise Exception("Malformed input. Please add either a tuple (ix,label) or a list [(ix,label),..]")
labels = self._offset_new_labes(labels)
self._update_masks(labels)
self.labels +=(labels)
@property
def unlabeld(self):
'''
:return: A view (in the numpy sense) of the features restricted to those that aren't labeled
'''
return self.features[self.unlabeled_mask]
@property
def labeled(self):
'''
:return: A view (in the numpy sense) of the features restricted to those that aren't labeled
'''
return self.features[self.labeled_mask]
@property
def remaining_sources(self):
'''
:return: The original inputs, masked so that only unlabeled ones are returned
'''
return self.sources[self.unlabeled_mask]
Metadata
Metadata
Assignees
Labels
No labels