FIT5145: Workshop Week 9

Workshop 9 - Data clustering

  • Activity 9.1: Data clustering methods

In this activity, we introduce methods to evaluate the quality of clustering, including the silhouette and elbow methods. We also explore K-means and hierarchical clustering models.

  • Activity 9.2: Validation and evaluation

In this activity, we explore how to measure and address model uncertainty. We introduce cross-validation and bootstrapping methods.

FIT5145: Workshop Week 9

Unsupervised learning

  • Not all data can be driven by theory
  • We still need work done
  • Unsupervised learning: find patterns in data without predefined labels

One common way is cluster analysis

  • Group similar data points together

But...

FIT5145: Workshop Week 9

Uncertainty

  • Model assumptions
  • Parameter estimation
  • Measurement errors
FIT5145: Workshop Week 9

Ways to address uncertainty

  • Cross-validation: resampling
  • Bootstrapping: simulating sampling
  • Bayesian methods: prior knowledge about the thing
FIT5145: Workshop Week 9

Today's activity

FIT5145: Workshop Week 9

Clustering methods [~40 mins]

  • Quality of clustering
    • Silhouette method
    • Elbow method
  • K-means clustering
  • Hierarchical clustering
FIT5145: Workshop Week 9

Validation and evaluation [~30 mins]

  • Cross-validation
  • Bootstrapping

In Activity 9.1, we introduce methods to evaluate the quality of clustering, including the silhouette and elbow methods. We also explore K-means and hierarchical clustering models. It would be helpful if you could discuss these concepts and their benefits. In Activity 9.2, we explore how to measure and address model uncertainty. We introduce cross-validation and bootstrapping methods. It would be great if you could discuss model uncertainty and the related methods. The due date for Assignment 2 is this Friday but I think many students might request a short extension (2 days). I'll share the grading instructions with you after that.