Skip to content

[Feature Proposal] Diverse Mini Batch Active Learning #119

Open
@damienlancry

Description

@damienlancry

hello, I noticed there is a big focus on uncertainty based sampling and information density based sampling techniques which is very nice. but in batch mode active learning, when several data points are sent to the oracle at the same time, it is often desired that the data points sent be diverse to avoid redundancy and maximise improvement of the model. several techniques has been designed, one of the most recent and also one of the simplest is Diverse Mini Batch Active Learning.

TLDR: compute uncertainty with chosen metric (e.g. margin, entropy, ...) and then prefilter ninstances * beta (beta is a prefiltering parameter, typically 10, 50 or 100) topmost uncertain data points. then perform kmeans clustering on those prefiltered points with instances clusters and select closest points to centroids.

it is quite simple to implement and give good results. I already have an implementation ready if you are interested in a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions