Q3
The following shows a history of PhD students with their numbers of published papers, their ages and their majors. We also indicate whether they become professors or not after their PhD graduation in the last column.
Note that the first column “No.” is for us to refer the record number only.
(a) We want to train a C4.5 decision tree classifier to predict whether a PhD student will become a professor or not. We define the value of attribute Become_Professor to be the label of a record.
(i) Please find a C4.5 decision tree according to the above example. In the decision tree, whenever we process (1) a node containing at least 80% records with the same label or (2) a node containing at most 2 records, we stop to process this node for splitting.
(ii) Consider a young PhD student majoring in computer science who published many papers. Please estimate the probability that this PhD student will become a professor.
(b) Let X be the set of attributes involved in the decision tree found in Part (a). Person A said that we just need to consider all attributes in X only to determine whether a student will become a professor. Person B said that we should also consider attributes outside X (in addition to attributes in X) to determine whether a student will become a professor.
(i) Please give a possible reason why Person A said in this way.
(ii) Please give a possible reason why Person B said in this way.
(iii) Which Person (Person A or Person B) is more unreasonable in general?
(c) What is the difference between the C4.5 decision tree and the ID3 decision tree? Why is there a difference?
Students succeed in their courses by connecting and communicating with an expert until they receive help on their questions
Consult our trusted tutors.