Document

Courses
Courses
- Bachelor
  Courses
  
  Bachelor
  - BK Bachelor
- Master
  Courses
  
  Master
  - BK Master
Subjects
Subjects
- 3D Modelling
  Subjects
  
  3D Modelling
  - 3D modelling
- AI & ML
  Subjects
  
  AI & ML
  - Machine Learning
  - Optimization
- Analysis & simulation
  Subjects
  
  Analysis & simulation
  - Climate analysis
- Collaboration
  Subjects
  
  Collaboration
  - Collaboration
- Geospatial and Geographic Information Systems
  Subjects
  
  Geospatial and Geographic Information Systems
  - Geospatial and Geographic Information Systems
- Other
  Subjects
  
  Other
  - Other
  - Robot Programming
- Parametric Modeling
  Subjects
  
  Parametric Modeling
  - General
Software
Software
- ANSYS
- ArcGIS
- BIM360
- Grasshopper
- Honeybee
- Jupyter Notebook
- Ladybug
- PDOK
- PUG
- Python
- QGIS
- Revit
- Rhino
- Speckle
- Wallacei
Labs
Labs
- BK Labs
  Labs
  
  BK Labs
  - BK Labs
  - Modelhall
- TU Labs at BK
  Labs
  
  TU Labs at BK
  - Artificial Intelligence Labs

Spectral Clustering

Intro
Overview
Training a Model
Applying a Transformation
Conclusions

Information

Primary software used	Jupyter Notebook
Course	Spectral Clustering
Primary subject	AI & ML
Secondary subject	Machine Learning
Level	Intermediate
Last updated	November 11, 2024
Keywords	Spectral Clustering

Spectral Clustering 0/4

Spectral Clustering link copied

Spectral clustering is a technique that uses the eigenvalues of a similarity matrix derived from the data to perform dimensionality reduction before applying a standard clustering algorithm like K-Means. It is particularly effective for identifying clusters in non-convex or complex-shaped datasets by exploiting the connectivity structure of the data in a graph-based representation.

For this tutorial you need to have installed Python, Jupyter notebooks, and some common libraries including Scikit Learn. Please see the following tutorial for more information.

Download the Jupyter notebook here to follow along with the tutorial.

Download SpectralClustering_PYscript_01

application/zip (ZIP, 189 KB)

Start Start

Spectral Clustering 1/4

Overview link copied

Spectral Clustering uses the eigenvalues of the similarity matrix of data to reduce their dimensionality and thus cluster them in a low dimensional manifold. It relates to graph theory and the concept of identifying groups of nodes based on the edges connecting them. Spectral Clustering is useful when the structure of a cluster is highly non-convex, or when a measure of the center and spread of a cluster does not describe a complete cluster. This happens, for example, when clusters are nested circles on a 2D plane.

Previous chapter Previous chapter Next chapter Next chapter

Spectral Clustering 2/4

Training a Model link copied

Dataset

For this notebook, we will use a generated dataset in the shape of two intersecting half moons. We can generate and visualize the dataset below.

from sklearn.datasets import make_moons
X, y = make_moons(n_samples=1000, noise=0.05)

import matplotlib.pyplot as plt
# plot
plt.scatter(
   X[:, 0], X[:, 1],
   c='white', marker='o',
   edgecolor='black', s=50
)
plt.show()

K-Means

We saw in the previous notebook that K-Means works well for data sets that have a cluster structure that is spherical like the example data set used in that notebook. Now, let’s take a look at a moon-shaped dataset. K-means cannot successfully cluster data in these configurations (even though visually we can see two clusters) because it uses a distance matrix to group points to a centroid. Below we will see what happens when we try to apply K-Means.

from sklearn.cluster import KMeans

clusters = 2

km = KMeans(
    n_clusters=clusters, init='random',
    n_init=16, max_iter=300, 
    tol=1e-04, random_state=0
)
y_km = km.fit_predict(X)

# plot the 3 clusters
colors = ['lightgreen', 'orange', 'lightblue', 'lightcoral', 'gold', 'plum', 'midnightblue', ' indigo', 'sandybrown']
markers = ['s', 'H', 'v', 'p', '^', 'o', 'X', 'd', 'P' ]

for i in range(clusters):
    plt.scatter(
        X[y_km == i, 0], X[y_km == i, 1],
        s=50, c=colors[i],
        marker=markers[i], edgecolor='black',
        label='cluster ' + str(i)
    )

# plot the centroids
plt.scatter(
    km.cluster_centers_[:, 0], km.cluster_centers_[:, 1],
    s=250, marker='*',
    c='red', edgecolor='black',
    label='centroids'
)
plt.legend(scatterpoints=1)
plt.grid()
plt.show()

A plot showing how the data set is clustered using K-Means clustering.

Since we know that k-means does not work, we need another clustering option. This is where spectral clustering comes in. For this example, we will use the generated dataset with the two intersecting moon shapes we just tried to cluster above.

Train a classifier using the SpectralClustering method and visualize it with the half moon dataset.

from sklearn.cluster import SpectralClustering 
from sklearn.preprocessing import StandardScaler, normalize 
from sklearn.decomposition import PCA 
import pandas as pd

# Building the clustering model 
spectral_model_rbf_2 = SpectralClustering(n_clusters = 2, affinity ='nearest_neighbors') 
  
# Training the model and Storing the predicted cluster labels 
labels_rbf_2 = spectral_model_rbf_2.fit_predict(X)

plt.scatter(X[:, 0], X[:, 1],  
           c = labels_rbf_2, edgecolor='white', cmap =plt.cm.winter) 
plt.grid()
plt.show()

A plot showing the clusters when using spectral clustering.

Previous chapter Previous chapter Next chapter Next chapter

Spectral Clustering 3/4

Applying a Transformation link copied

Dataset

It is also possible to find other methods to change the dimensionality of the data. First, let’s take a look at a dataset that consists of an inner circle and an outer ring. Spectral Clustering is able to classify the data.

import matplotlib.pyplot as plt
from sklearn.datasets import make_circles

#Generate data set
X_1, y_1 = make_circles(n_samples=(300, 100), shuffle=True, noise=0.05, factor=0.1, random_state=3)

# Plot data set
plt.scatter(
   X_1[:, 0], X_1[:, 1],
   c='black', marker='o',
   edgecolor='white', s=50
)
plt.grid()
plt.show()

Clustering with Spectral Clustering

We can first see the clusters with spectral clustering.

# Building the clustering model 
spectral_model_nn = SpectralClustering(n_clusters = 2, affinity ='nearest_neighbors') 
  
# Training the model and Storing the predicted cluster labels 
labels_nn = spectral_model_nn.fit_predict(X_1)

plt.scatter(X_1[:, 0], X_1[:, 1],  
           c = labels_nn, edgecolor='white', cmap =plt.cm.winter)
plt.grid()
plt.show()

A plot showing the two clusters of data when using spectral clustering

Clustering by Transforming the Data

Alternatively, we can apply a transformation to the data set ourselves to reduce the dimensionality. For the dataset above, we can find the distance to the center (0,0) in this case. This transforms the dataset to one-dimensional space and the data becomes linearly separable. We can then apply K-Means to get the clusters.

import numpy as np
import copy

#Distance to 0,0
X_1_np = np.array(X_1) #points in dataset
dist = (X_1_np[:, 0] ** 2 + X_1_np[:, 1] ** 2) ** 0.5 #get distance to zero
dist_coord = np.array(list(zip(dist, np.zeros(dist.shape))))

#Plot
plt.scatter(dist_coord[:,0], dist_coord[:,1],  
           c = 'black', edgecolor='white', cmap =plt.cm.winter)
plt.grid()
plt.show()

A plot showing the transformed data: plotting the distance to the center

from sklearn.cluster import KMeans

#K-Means with Visualization
km = KMeans(
    n_clusters=2, init='random',
    n_init=10, max_iter=300, 
    tol=1e-04, random_state=0
    )

y_km = km.fit_predict(dist_coord)


plt.scatter(dist_coord[y_km == 0,0], dist_coord[y_km == 0, 1], 
           label = 'cluster 1', edgecolor='white', cmap =plt.cm.winter) 
plt.scatter(dist_coord[y_km == 1,0], dist_coord[y_km == 1, 1],  
           label = 'cluster 2', edgecolor='white', cmap =plt.cm.winter) 
plt.legend(scatterpoints=1)
plt.grid()
plt.show()

The two clusters of the transformed data

#Apply labels determined by K-Means back to the origional data set

plt.scatter(X_1[y_km == 0,0], X_1[y_km == 0, 1], 
           label = 'cluster 1', edgecolor='white', cmap =plt.cm.winter) 
plt.scatter(X_1[y_km == 1,0], X_1[y_km == 1, 1],  
           label = 'cluster 2', edgecolor='white', cmap =plt.cm.winter) 
plt.legend(scatterpoints=1)
plt.grid()
plt.show()

Assigning the cluster number back to the original data points. The assignment is based on the transformed data clusters.

TASK: Choose two features from the wine dataset which are appropriate for clustering and apply spectral clustering on the data based on those two features.

Don’t forget to normalize the input data using a pipeline like we have done in previous notebooks. Check the SVM notebook for an example.

Previous chapter Previous chapter Next chapter Next chapter

Spectral Clustering 4/4

Conclusions link copied

Spectral Clustering uses the eigenvalues of the similarity matrix of data to reduce the data’s dimensionality and then clusters it in fewer dimensions
It draws inspiration from graph theory and the concept of identifying groups of nodes based on the edges connecting them
Spectral Clustering is useful when the structure of a cluster is highly non-convex or when a measure of the center and spread of a cluster does not describe a complete cluster
It is possible to use a similar process without implementing spectral clustering by transforming data to a lower dimension and then applying K-Means or Gaussian Mixtures on the linearly separable data

Previous chapter Previous chapter Next chapter Next chapter

Teachers	Charalampos Andriotis , Lisa-Marie Mueller , Michela Turrin
Faculty	Bouwkunde

Spectral Clustering

Information

Responsible

Spectral Clustering 0/4

Spectral Clustering link copied

Jupyter Notebook Installation & Overview

Download SpectralClustering_PYscript_01

application/zip (ZIP, 189 KB)

Spectral Clustering 1/4

Overview link copied

Spectral Clustering 2/4

Training a Model link copied

Dataset

K-Means

Spectral Clustering 3/4

Applying a Transformation link copied

Dataset

Clustering with Spectral Clustering

Clustering by Transforming the Data

Spectral Clustering 4/4

Conclusions link copied