Your task is to build the best possible model for predicting whether or not a consumer
Ask Expert

Be Prepared For The Toughest Questions

Practice Problems

Your task is to build the best possible model for predicting whether or not a consumer

Introduction

For this week’s take-home lab, you will work on the same data set from Week 4/5 Take-Home Labs. You will solve the very same problem studied in this week’s in-class lab on a much larger and more interesting dataset. The data contained in the file UCI_Credit_Card.csv contains 30,000 consumer records with 24 different variables. You can read a detailed description of the different fields at the following website:

https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

The description from the UCI says marriage should have levels: Marital status (1 = married; 2 = single; 3 = others) However, there are levels (0,1,2,3). You should treat 0 as unknown. the description from the UCI says Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). However, there are levels 1 to 6 for education. Thus here 5 = 6 = unknown. X6-X11: The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. However, there are many factors that are -2. This is also unknown. So every unknown you should treat them as NA.

Your task is to build the best possible model for predicting whether or not a consumer will default on their credit card payment for the next month (the last column in the dataset).

Assignment

Perform the following tasks:

Conduct a training/test split of the data, building a 20% held out test dataset

Fit the best RF model you can (consider feature selection etc.) to the data to predict consumer default.

Then plot ROC curves for the logistic regression, SVM, KNN, CART, and RF models, and compare their performance.

Compute the AUC for the logistic regression, SVM, KNN, CART, and RF models, and compare their performance.

Provide a summary and discussion of your work in written form (.docx or .pdf) that includes the following:

o Q1 Summarize the model/feature selection process you used to fit your RF model

o Q2 Provide a summary of the fitted RF model (i.e. model summary)

o Q3 Provide performance evaluation of the fitted RF model using confusion matrix.

o Q4 How well do you think the fitted RF model to this dataset works?

o Q5 Using ROC curves and AUC, which one of logistic regression, SVM, KNN, CART, and RF models works better with the dataset over all?

ucicreditcard

Hint
StatisticsLogistic regression is a statistical model, to model a binary dependent variable, which in its basic form uses a logistic function. Logistic regression, in regression analysis, is the estimation of the parameters of a logistic model which is a form of binary regression. It is the go-to method for the problems of binary classification....

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

1
img

Submit Question

Post project within your desired price and deadline.

2
img

Tutor Is Assigned

A quality expert with the ability to solve your project will be assigned.

3
img

Receive Help

Check order history for updates. An email as a notification will be sent.

img
Unable to find what you’re looking for?

Consult our trusted tutors.

Developed by Versioning Solutions.