Materials for Applied Data Science profile course INFOMDA2 *Battling the curse of dimensionality*.

**1. Download the corn data and store it in your
assignment folder.**

**2. Pick a property (Moisture, Oil, Starch, or Protein) to predict.**

**3. Split your data into a training (80%) and test (20%) set.**

**4. Use the function plsr from the package pls to estimate a
partial least squares model, predicting the property using the NIR
spectroscopy measurements in the training data.** Make sure that the
features are on the same scale. Use leave-one-out cross-validation
(built into

`plsr`

) to estimate out-of-sample performance.**5. Find out which component best predicts the property you chose.
Explain how you did this.**

**6. Create a plot with on the x-axis the wavelength, and on the y-axis
the strength of the loading for this component. Explain which
wavelengths are most important for predicting the property you are
interested in.**

**7. Pick the number of components included in the model based on the
“one standard deviation” rule ( selectNcomp()). Create predictions for
the test set using the resulting model.**

**8. Compare your PLS predictions to a LASSO linear regression model
where lambda is selected based on cross-validation with the one standard
deviation rule (using cv.glmnet).**

A zipped folder with:

- The required data in a
`data/`

subfolder - A
`.Rmd`

file with your answers and clean, commented code chunks - A compiled
`.html`

or`.pdf`

file from this`.Rmd`

. - The folder should be
*portable*, i.e., the teacher should be able to recompile the`.Rmd`

without error upon unzipping!