|
|
You are here: Data Mining
» Cluster Analysis
| Cluster Analysis |
 |
One big area of Data Mining is Cluster Analysis.
Given a set of data points, each having a set of attributes, and a similarity measure
among them. The task now is to discover clusters such that data points in one cluster are more similar to one another and
also data points in separate clusters are less similar to one another.
[image taken from: Pang-Ning Tan et. al]
The problem faced by clustering is ill-posed due to the fact that
several different notation of a cluster co-exist (density, size and shape, hierarchical, etc).
Generally 2 different approaches of clustering exist. Hierarchical (nested) and
Partitional (unnested) clustering. In partitional clustering the points are assigned into
non-overlapping or exclusive clusters whereas in hierarchical clustering the points
may belong to multiple clusters (non-exclusive). i.e. student can be enrolled as student and as
employee at a university.
There is a row of other properties and distinctions that define
a set of clusters (or they do their best to give a clear description about their interpretation of
clusters and how they are distinguished). An example of one such property is the density.
There, each cluster has a considerable higher density of points than outside of the cluster (See figure).
I implemented a few Cluster Algorithms that try to discover the true clusters in the provided data-sets.
Depending on their assumption about their interpretation of clusters, the results will turn out to be
quite different from one another.
|
|
|
|
|