Skip to content

Add a Data Manager Class #80

Open
Open
@talolard

Description

@talolard

Hey is this something you'd like to add to the library ?

Rationale

Bookeeping which examples have been labeled is no fun and error prone and verbose. I propose a "Manager" class that will do it for us, so that labeling looks like this:

User Experience

for index in range(10):
    
    query_index, query_instance = learner.query(manager.unlabeld)
    ix = query_index[0]
    print(manager.remaining_sources[ix])
    label = (ix,int(input()))
    manager.add_labels(label)
             

In a notebook it works like this:
example

Idea

The idea is that the manager maintains masks for the labeled and unlabeled indices, and calculates offsets when adding new labels. End user shouldnt care about this stuff, and they can get the unlabeled examples from manager.unlabeled which is a view on the array.

This is some mock code, I haven't tested it properly yet

import typing
from typing import List,Tuple,Any,Union
import numpy as np
Label = Tuple[int, Any]
LabelList = List[Label]
Sources =List[Any]
class ALManager():
    def __init__(self, features:np.ndarray, labels:LabelList=None, sources:Sources=None):
        '''

        :param features: An array of the features that will be used for AL.
        :param labels: Any prexesiting labels. Each label is a tuple(idx,label)
        :param source: A list of the original data
        '''
        if labels is None:
            labels = []
        self.features = features
        if labels is None:
            self.labels = []
        self.labels = labels
        self.labeled_mask = np.zeros(self.features.shape[0],dtype=bool)
        self.unlabeled_mask = np.ones(self.features.shape[0],dtype=bool)
        self._update_masks(self.labels)
        self.sources = np.array(sources if sources else [])
    def _update_masks(self,labels:Union[LabelList,Label]):

        for label in labels:
            self.labeled_mask[label[0]] = True
            self.unlabeled_mask[label[0]] = False
    def _offset_new_labes(self,labels:LabelList):
        if len(self.labels)==0:
            # Nothing to correct in this case
            return labels
        correctLabels: LabelList = []
        labeledIndices =self.labeled_mask.nonzero()[0]

        for label in labels:
            #The argmax trick on a bool condition returns the index of the first labeled item which came before the new one
            newLabel :Label = (np.argmax(labeledIndices<=label[0])+1+label[0],label[1])
            correctLabels.append(newLabel)
        return correctLabels
    def add_labels(self, labels:LabelList):
        if isinstance(labels,tuple): # if this is a single example
            labels :LabelList = [labels]
        elif isinstance(labels,list):
            pass
        else:
            raise Exception("Malformed input. Please add either a tuple (ix,label) or a list [(ix,label),..]")
        labels = self._offset_new_labes(labels)
        self._update_masks(labels)
        self.labels +=(labels)

    @property
    def unlabeld(self):
        '''

        :return: A view (in the numpy sense) of the features restricted to those that aren't labeled
        '''
        return self.features[self.unlabeled_mask]

    @property
    def labeled(self):
        '''
                :return: A view (in the numpy sense) of the features restricted to those that aren't labeled
        '''
        return self.features[self.labeled_mask]

    @property
    def remaining_sources(self):
        '''

        :return: The original inputs, masked so that only unlabeled ones are returned
        '''
        return self.sources[self.unlabeled_mask]


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions