Assessment 4 Detail
For this assessment, you are required to use Weka software and a text editor such as WordPad, Notepad++ for windows system or Textedit for Mac.
You can download Weka from https://www.cs.waikato.ac.nz/ml/weka/downloading.html).
Task 1: Create and explore the Weka data file of type ARFF
Download a text file called dataset.csv from the subject site (Canvas) and open it using a text editor such as WordPad, Notepad++ etc., for windows system or Textedit for Mac. You need to explore and convert this file into an ARFF file for Weka. The text file you will be using contains a sample of real-life data related to customers. The data.csv file is not entirely formatted as a Weka file (ARFF). This file has some formatting errors, and your task is to find these errors and fix them to have a valid ARFF file. Save the valid file as a dataset.arff.
Explore the dataset.arff dataset using Weka Explorer and answer the following questions.
Make sure to include screenshots of the visualisations to support your answers.
1. Take a screenshot of your corrected ARFF file.
2. Which attribute in the dataset do you think is useless and did not provide useful information for prediction?
3. How many attributes the dataset has?
4. How many instances the dataset has?
5. What is the class attribute in the data.arff dataset?
6. What proportion of customers who has a mortgage and live in Inner City?
7. What proportion of customers who has a mortgage and not living in Inner City?
8. What proportion of customers have a mortgage, and their income is between $1000 and $10000?
9. How many customers are married and have no mortgage?
10. How many customers have not owned a car and have a mortgage?
Task 2: Practical Analysis
Use the dataset from Task 1 to perform data mining tasks for Task 2 and compare the performance on this data set for the following classification algorithms using classification algorithms:
• Naive Bayes
• HoeffdingTree
• SVM ( or SMO)
• J48
Write a summary report that compares the performance of these algorithms. Make sure to comment on these algorithms performance and accuracy using the performance metrics shown in the classifier output, such as the confusion matrix, etc. In your report, you need to state if there is a difference in the performance between these algorithms and which algorithm performs best.
Make sure to include the necessary tables, graphs, screenshots etc., to make your report understandable to the person who reads it.
Students succeed in their courses by connecting and communicating with an expert until they receive help on their questions
Consult our trusted tutors.