Exercise 1
- Execute the code above. Based on the results, rank the models from “most underfit” to “most overfit”.
- Re-run the code above with 100 folds and a different seed. Does your conclusion change?
-
- Generate four confusion matrices for each of the four models fit in Part 1.
- Which is the best model? Write 2 paragraphs justifying your decision. You must mention (a) the overall accuracy of each model; and (b) whether some errors are better or worse than others, and you must use the terms specificity and sensitivity. For (b) think carefully… misclassified email is a pain in the butt for users!
Exercise 2
- Use the bank data and create a train / test split.
- Run any logistic regression you like with 10-fold cross-validation in order to predict the
yes/no
variable (y
). - Discuss the interpretation of the coefficients in your model. That is, you must write at least one sentence for each of the coefficients which describes how it is related to the response. You may use transformations of variables if you like. FAKE EXAMPLE:
age
has a positive coefficient, which means that older individuals are more likely to havey = yes
. - Create a confusion matrix of your preferred model, evaluated against your test data.
"Looking for a Similar Assignment? Get Expert Help at an Amazing Discount!"
