Exercise 2 - Frequent Itemsets
For this exercise, you have to read Section 6.4 up to 6.4.3 in Mining of Massive Datasets (3rd edition).
1. Implement the simple, randomized algorithm given in Section 6.4.1
2. Implement the algorithm of Savasere, Omiecinski, and Navathe (SON algorithm) in 6.4.3
3. Compare the two algorithms on the datasets T10I4D100K, T40I10D100K, chess, connect, mushroom, pumsb, pumsb_star provided at http://fimi.ua.ac.be/data/ and report the outcomes.
4. Experiment with different sample sizes in the simple randomized algorithm such as 1, 2, 5, 10% and compare your results (including the result produced by the SON algorithm).
Your approach should be as efficient as possible in terms of runtime and memory requirements. Report on any challenges that you might have observed in the implementation and by running the experiments.
Students succeed in their courses by connecting and communicating with an expert until they receive help on their questions
Consult our trusted tutors.