[ENH] Switch estimator operator logical checks to be interface based rather than inheritance based

# General Introduction and Relevance
Currently, many imbalanced-learn samplers require an estimator explicitly inherits from Scikit-learn Mixin classes in order to determine the nature of an estimator operator;  often this is achieved with the method [check_neighbors_object()](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/edf6eae2c00f7fa6d76ee381f5b625155061a725/imblearn/utils/_validation.py#L70-L99) . 

This Scikit-learn inheritance based programming model provides consistency, but also limits flexibility. Switching these checks to be duck-typing based rather than inheritance based would still preserve consistency, but also allow users to use other scikit-learn API-compliant libraries with imbalanced-learn that cannot directly subclass scikit-learn, such as [cuML](https://github.com/rapidsai/cuml/) . Using cuML with imbalanced-learn can be significantly faster, as shown in the examples below that achieve roughly 180x and 680x speedups with the example configuration.
# Additional Context
A key limitation is that users cannot cleanly integrate estimators from libraries that do not directly depend on scikit-learn, but enforce the same API contract.

The use of a duck typing based programming model would allow imbalanced-learn to be more flexible;  duck typing in PyData is becoming increasingly powerful as more libraries are standardizing on numpy and scikit-learn like API contracts rather than inheritance (Dask, xarray, Pangeo, CuPy, RAPIDS, and more).

The TPOT community recently adopted duck-typing to allow GPU-accelerated CUML estimators and have seen great results: https://github.com/EpistasisLab/tpot/blob/master/tutorials/Higgs_Boson.ipynb
# Proposal
In [check_neighbors_object()](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/edf6eae2c00f7fa6d76ee381f5b625155061a725/imblearn/utils/_validation.py#L70-L99), convert the [Mixin class check](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/edf6eae2c00f7fa6d76ee381f5b625155061a725/imblearn/utils/_validation.py#L96) to instead check key attributes that determine whether an object is a certain type of estimator:

Instead of KNeighborsMixin, check for the key attributes that determine whether an object is KNeighborsMixin-like

- kneighbors()
- kneighbors_graph()

If there is potential interest, I'd be happy to open up a pull request for further discussion.
# Potential Impact
Adopting duck typing for these checks would immediately allow using cuML (a GPU-accelerated machine learning library with a scikit-learn-like API) with imbalanced-learn. It would also open the door to other libraries (current and future), without requiring imbalanced-learn to have any knowledge or involvement.

As a concrete example of the benefit, using cuML estimators on the GPU instead of scikit-learn can provide a large speedup. Faster estimators can make a significant difference in the total runtime. The benchmark graphs below provide a couple small examples (reproducing gist provided at the bottom) which correspond to 180x and 680x speedups using cuML on a GPU vs the default scikit-learn with CPU cores.

There are a number of samplers that this proposed change would affect;  allowing for the integration of estimators that do not directly subclass scikit-learn.  This list includes: ADASYN, SMOTE, BorderlineSMOTE, SMOTENC, EditedNearestNeighbours, and RepeatedEditedNearestNeighbours samplers.

Samplers, such as CondensedNearestNeighbour, KMeansSMOTE, SMOTEN, SMOTEENN, and others, incorporate additional steps in their _validate_estimator() methods, and will require additional modifications to enable their seamless integration with estimators that do not directly subclass scikit-learn.

![adasyn](https://user-images.githubusercontent.com/86264103/131403332-76f5bdd9-d7a7-49f7-a213-a14729ecb0ca.png)
![smote](https://user-images.githubusercontent.com/86264103/131403381-3e879492-b0ac-459d-9fd4-9d23b2cb9c8b.png)


Hardware Specs for the Loose Benchmark:
Intel Xeon E5-2698, 2.2 GHz, 16-cores & NVIDIA V100 32 GB GPU
 
Benchmarking Code: 
https://gist.github.com/NV-jpt/276be3fe57b0ca384dbdabeba4a7e643 
 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Switch estimator operator logical checks to be interface based rather than inheritance based #856

General Introduction and Relevance

Additional Context

Proposal

Potential Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENH] Switch estimator operator logical checks to be interface based rather than inheritance based #856

Description

General Introduction and Relevance

Additional Context

Proposal

Potential Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions