The main goal is to try to predict as accurately as possible if a patient would die

Be Prepared For The Toughest Questions

Practice Problems

Part 2: Classification with Ensemble Methods

This part involves the following file: hepatitis.arff, in the directory: /KDrive/SEH/SCSIT/Students/Courses/COSC2111/DataMining/data/arff/UCI/

The main goal is to try to predict as accurately as possible if a patient would die. In other words, the task is more concened with predicting correctly the class label of those instances associated with the label “die”. You are expected to use ensemble methods (e.g., Bagging, Random Forest, AdaBoost, Voting and Stacking) for this task.

1. Load the data set, and run ZeroR (i.e., a very basic classifier) and J48 (a typical decision tree) to establish two baselines, for comparing your ensemble methods.

2. Discuss whether or not ensemble methods (such as Bagging) would be suitable for handling this specific data set on hepatitis.

3. From Weka, run Bagging (via meta classifiers). Run the Bagging classifier for different numbers of iterations, for example 5,10,100,200,500,1000 and build a table of results. What do you observe? Provide your explanation.

4. Repeat the above experiment for running AdaBoostM1 classifier and RandomForest classifier, and build your table of results. What do you observe for the results across different ensemble methods? Provide your explanation.

5. From Weka, run the “Vote” ensemble method, and try to include several different machine learning (ML) models, such as OneR, J48, Na¨ıveBayes, Neural Network, etc for the ensemble. The idea is to try to use a diverse range of ML models as much as possible. Run “Vote” with this composition of ML algorithms, and that using the Vote default setting (which uses only ZeroR). Include these results for comparison in a table, and provide your analysis on whether or not there are differences between the two. In addition, is “Vote” method (with a diverse composition of ML algorithms) better than the previously used ensemble methods? Provide your explanation.

6. Considering that the task is more concerned with predicting as acurrately as possible if a patient would die, and we wish to minimize the chance of misdiagnosing anyone who would die, what would be the most appropriate (and meaningful) performance measure to use here? You need to provide sufficiently detailed explanation to demonstrate your understanding.

7. Comparing with part 1 (i.e., using neural networks), what differences in the results can you observe? Discuss the issue of using just classification error (or accuracy) as a performance measure.

8. From the above experimental runs and result analysis, explain whether (or not) ensemble methods should be considered as effective data mining methods.

Hint

ComputerBagging, stacking, and boosting are the three primary classes of ensemble learning techniques, and it's critical to understand each one well and take it into account in any predictive modeling project. The concept behind ensemble classification is to learn a group of classifiers, or an ensemble of classifiers, and then combine their predictions for the classification of examples that have ...

Select Deadline for Completion

4 Days

3 Days

2 Days

1 Day

1 to 15 Hours

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

Unable to find what you’re looking for?

Consult our trusted tutors.

Ask a Question

Be Prepared For The Toughest Questions

Practice Problems

Related questions

Know the process

Submit Question

Tutor Is Assigned

Receive Help

Unable to find what you’re looking for?