For each optimised model, produce a confusion matrix and report the following

Be Prepared For The Toughest Questions

Practice Problems

Part 3 – “Real world” testing

a) Load new test data from the “real world” EmailSamples50000.csv.

b) For each of your models (with the optimised parameters which you have identified in part 2), run your classifier on the EmailSamples50000.csv test data.

c) For each optimised model, produce a confusion matrix and report the following:

i. Sensitivity (the detection rate for actual malware samples)

ii. Specificity (the detection rate for actual non-malware samples)

iii. Overall Accuracy

d) A brief statement which includes a final recommendation on which model to use and why you chose that model over the others.

What to Report

You must do all of your work in R.

1. Submit a single report containing:

a. a brief description of your three selected supervised learning algorithms.

b. For each algorithm:

i. The optimised parameters for the algorithm.

ii. A confusion matrix on the test set of the MalwareSamples.csv data showing the accuracy of the algorithm with the optimised parameters.

iii. A confusion matrix showing the accuracy of the algorithm for the ‘real world’ EmailSamples.csv data

iv. A short description of the accuracy, sensitivity and selectivity of the optimised algorithm when applied to the ‘real world’ data.

c. A short paragraph explaining your chosen algorithm and parameters and why this was chosen over the alternatives. Written in language appropriate for an educated software developer without a background in math.

Note: At the end you will present your findings of 3 algorithms showing 2 confusion matrix tables for each (1 for the MalwareSamples dataset, and 1 for the EmailSamples dataset). You will also present a description of accuracy, sensitivity and selectivity for each of the 3 algorithms.

2. If you use any external references in your analysis or discussion, you must cite your sources.

emailsamples

malwaresamples

Hint

Management" A confusion matrix is a table used to describe how a classification model performs on a set of test data whose true values are known. It shows the visualization of algorithm performance. It is useful because it gives a direct comparison of values."...

Select Deadline for Completion

4 Days

3 Days

2 Days

1 Day

1 to 15 Hours

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

Unable to find what you’re looking for?

Consult our trusted tutors.

Ask a Question

Be Prepared For The Toughest Questions

Practice Problems

Related questions

Know the process

Submit Question

Tutor Is Assigned

Receive Help

Unable to find what you’re looking for?