In this assignment, you will design, implement, and critically compare two different clustering analyses for a single dataset.
1. Create an assignment folder with your assignment
.Rmd file in the
root and the following subdirectories:
2. Find a clustering dataset with 10-100 columns (attributes) in the
UCI machine learning repository. Download the dataset in the
subdirectory of your assignment folder. It’s easiest to use this
3. Preprocess the data into a
dataset (a data frame or tibble). This can include things like
transforming variables (e.g., feet to meters), giving each variable the
correct measurement level (character, factor, ordered factor, numeric)
and selecting only the columns you need. Save the tidy dataset as an
.rds in the
4. Choose two different clustering methods. This can be any method of your choice, even combinations of methods like PCA + K-means. Describe these methods and why you chose them for this dataset.
5. Apply these methods to your dataset. Make sure to apply the knowledge you obtained in the clustering weeks.
6. Decide and describe how you will compare these methods in your dataset, and then implement this comparison.
7. Write a short conclusion where you critically compare the relative strengths and weaknesses of the methods you chose.
You will pass the assignment if the hand-in format is correct (see below) and if the following elements are in your report
A zipped folder with:
.rdsfile in a
.Rmdfile with your answers and clean, commented code chunks
.Rmdwithout error upon unzipping!