INFOMDA2

Logo

Materials for Applied Data Science profile course INFOMDA2 Battling the curse of dimensionality.

Course Description

The ever-growing influx of data allows us to develop, interpret and apply an increasing set of learning techniques. However, with this increase in data comes a challenge: how to make sense of the data and identify the components that really matter in our modeling efforts. This course gives a detailed and modern overview of statistical learning with a specific focus on high-dimensional data.

In this course we emphasize the tools that are useful in solving and interpreting modern-day analysis problems. Many of these tools are essential building blocks that are often encountered in statistical learning. We also consider the state-of-the-art in handling machine learning problems. We will not only discuss the theoretical underpinnings of supervised learning, but focus also on the skills and experience to rapidly apply these techniques to new problems.

During this course, participants will actively learn how to apply the main statistical methods in data analysis and how to use machine learning algorithms and visualization techniques, especially on high-dimensional data problems. The course has a strongly practical, hands-on focus: rather than focusing on the mathematics and background of the discussed techniques, you will gain hands-on experience in using them on real data during the course and interpreting the results.

Prerequisites

The course INFOMDA1 (or equivalent) serves as a sufficient entry requirement for this course. For information about the contents of the INFOMDA1 course, refer to its course website.

Course Objectives

At the end of this course, students are able to apply and interpret the theories, principles, methods and techniques related to contemporary data science and understand and explain different approaches to data analysis:

Required Readings

Freely available sections from the following books:

Required Software

In this course, we will exclusively use R & RStudio for data analysis. First, install the latest version of R for your system (see https://cran.r-project.org/). Then, install the latest (desktop open source) version of the RStudio integrated development environment (link).

We will make extensive use of the tidyverse suite of packages, which can be installed from within R using the command install.packages("tidyverse").

Course Policy

Weekly course flow

Grading policy

Class Schedule

Key dates and deadlines

Day Date Time Location Description
Wednesday 17-11-2021 13:15 - 15:00 BBG 161 Lecture 1
Friday 19-11-2021 13:15 - 15:00 BBG 201 Q&A 1
Wednesday 24-11-2021 13:15 - 15:00 BBG 161 Lecture 2
Friday 26-11-2021 13:15 - 15:00 BBG 201 Q&A 2
Wednesday 01-12-2021 13:15 - 15:00 BBG 161 Lecture 3
Friday 03-12-2021 13:15   Deadline assignment 1
Friday 03-12-2021 13:15 - 15:00 BBG 201 Q&A 3
Wednesday 08-12-2021 13:15 - 15:00 BBG 161 Lecture 4
Friday 10-12-2021 13:15 - 15:00 BBG 201 Q&A 4
Wednesday 15-12-2021 13:15 - 15:00 BBG 161 Lecture 5
Friday 17-12-2021 13:15 - 15:00 BBG 201 Q&A 5
Wednesday 22-12-2021 13:15 - 15:00 BBG 161 Lecture 6
Friday 24-12-2021 13:15 - 15:00 BBG 201 Q&A 6
Break        
Wednesday 12-01-2022 13:15 - 15:00 BBG 161 Lecture 7
Friday 14-01-2022 13:15 - 15:00 BBG 201 Q&A 7
Wednesday 19-01-2022 13:15 - 15:00 BBG 161 Lecture 8
Friday 21-01-2022 13:15   Deadline assignment 2
Friday 21-01-2022 13:15 - 15:00 BBG 201 Q&A 8
Wednesday 26-01-2022 13:15 - 15:00 BBG 161 Lecture 9
Friday 28-01-2022 13:15 - 15:00 BBG 201 Q&A 9
Friday 04-02-2022 08:30 - 11:30 Megaron Exam
Friday 04-03-2022 TBD   Resit

Lecture 1: Introduction & betting on sparsity with the LASSO

17-11-2021 | 13:15 - 15:00

Required reading

Optional reading

Refresh your memory:

Lab session preparation

Assignments 1-5 of the first practical.

Lecture 2: Dimension reduction 1

24-11-2021 | 13:15 - 15:00

Required reading

Lab session preparation

Take-home exercises of the practical.

Lecture 3: Dimension reduction 2

01-12-2021 | 13:15 - 15:00

Required reading

Lab session preparation

Take-home exercises of the practical.

Assignment 1

Partial least squares (link). Hand in on blackboard before practical 3 (03-12-2021 | 13:15).

Lecture 4: Deep learning

08-12-2021 | 13:15 - 15:00

Required reading

Lab session preparation

Take-home exercises of the practical.

Lecture 5: Clustering

15-12-2020 | 13:15 - 15:00

Required reading

Optional reading

Lab session preparation

TBD

Lecture 6: Model-based clustering

22-12-2020 | 13:15 - 15:00

Required reading

Optional reading

Lab session preparation

TBD

Winter break

Lecture 7: Time series

12-01-2021 | 13:15 - 15:00

Required reading

Lab session preparation

TBD

Lecture 8: Text mining 1

19-01-2021 | 13:15 - 15:00

Required reading

Lab session preparation

TBD

Assignment 2

TBD. Hand in on blackboard before practical 8 (21-01-2022 | 13:15).

Lecture 9: Text mining 2

26-01-2021 | 13:15 - 15:00

Required reading

Lab session preparation

TBD

Exam

04-02-2021 | 8:30 - 11:30

Resit

Target date: 05-03-2021, to be confirmed.