Skip to content

[BUG] NCL - class should be cleaned if number of sampes is 0.5 * minority samples, not if 0.5* data.shape[0] #764

Closed
@solegalli

Description

@solegalli

Describe the bug

Neighbourhood cleaning rule procedure:

  1. Split data T into the class of interest C (minority) and the rest of data O.
  2. Identify noisy data A1 in O with edited nearest neighbor rule.
  3. For each class Ci in O: (this is, for each observation in the majority class(es)
    if ( x Ci in 3-nearest neighbors of misclassified y C )
    and ( | Ci | ‡ 0.5 · | C | ) then A2 = { x } A2
  4. Reduced data S = T - ( A1 union A2 )

The above is a copy of the pseudo code in the article. There, C is the minority class or class of interest.

Further quote what is on the article:
"To avoid excessive reduction of small classes, only examples from classes larger or equal to 0.5 * | C | are considered while forming A2. " and it previously mentions that C is the minority. They refer to the entire dataset as T.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions