Skip to content

Optimal Kmeans documentation #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added assets/images/studyclust14.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 12 additions & 2 deletions tutorials/10_Group_analysis/component_clustering_tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ You may call the [pop_clust.m](http://sccn.ucsd.edu/eeglab/locatefile.php?file=p

![](/assets/images/studyclust5.png)

Several algorithms are available: *kmeans*, *neural network*, and *affinity* clustering.
Several algorithms are available: *kmeans*, *neural network*, *affinity*, and *affinity* clustering.

*Kmeans* requires the MATLAB Statistics Toolbox, while *neural network* clustering uses a function from the MATLAB Neural Network Toolbox. A version of *kmeans* that does not require the MATLAB Statistics Toolbox is also available. *Affinity* clustering does not require any toolbox. We recommend using *affinity* clustering which does not require to specify the number of clusters, then try the *kmeans* algorithm if the results are not satisfactory.

Expand All @@ -303,10 +303,20 @@ defined as components further than a specified number of standard
deviations (3, by default) from any of the cluster centroids. To turn
on this option, click the upper checkbox on the left. Identified
outlier components will be placed into a designated *Outliers* cluster
(Cluster 2).
(Cluster 2).

Press *Ok*. The cluster editing interface detailed in one of the following sections will automatically pop up.

Optimal Kmeans clustering
-----------------
We have recently added **Optimal Kmeans** algorithm to the `pop_clust` function. This feature allows you to find the optimal number of clusters for your data. To use this feature, you must have the [MATLAB Statistics and Machine Learning Toolbox](https://www.mathworks.com/products/statistics.html) installed.

To use this feature, select the **Optimal Kmeans** option from the **Clustering algorithm** dropdown menu. Then, you need to input a range of cluster numbers to test (in the screenshot below, the minimum is set to 10, and the maximum is set to 30). The algorithm will then test the clustering for each number of clusters in the range and choose the optimal number of clusters based on the **silhouette** score. The **silhouette** score is a measure of how similar an object is to its own cluster compared to other clusters. The optimal number of clusters is the one that maximizes the **silhouette** score. Read more about the **silhouette** score from the [MATLAB documentation](https://www.mathworks.com/help/stats/clustering.evaluation.silhouetteevaluation.html).

**Recommended number of clusters:** Following the rationale for the estimated number of clusters above, we recommend setting the lower bound of the cluster range to half the average number of components per subject. For example, if there are 20 components per subject, set the lower bound to 10. Similarly, set the upper bound to 1.5 times the average number of components per subject. For example, for 20 components per subject, set the upper bound to 30. If the returned number of clusters is at its lower or upper bound, consider expanding the range. We also strongly recommend using the option to separate outliers.

![](/assets/images/studyclust14.png)

Other clustering methods
-----------------
The main method to cluster components in EEGLAB is the *PCA clustering method* described in this tutorial. Other methods are the *Measure Projection method* and the *Scalp Correlation method* available in the EEGLAB plugins described below.
Expand Down