Next: Topic models [3* P] Up: MLA_Exercises_2009 Previous: Bonus example: Moral graph

# Bagging and Boosting [4 P]

Use Gilad Mishne's applet3 to see the advantages of ensemble methods and compare bagging and boosting on different datasets.

a)
Create a dataset with 100 red and 800 blue points according to the distribution shown in Figure 4 a) (make sure there is no overlap between the classes). Apply the classifiers Decision Stump, C4.5, Bagging (with Decision Stump) and AdaBoost (with Decision Stump) to the dataset. Visualize their prediction and explain the results. Why are the boosting and bagging hypotheses different?(Hint: With Generate Report you can examine the individual weak learners.)

b)
Create a large dataset with approx. 2000 points per class according to the distribution shown in Figure 4 b) (avoid class-overlapping if possible). Examine the test error and variance of different classifiers when training sets of different sizes are used. Compare C4.5, Bagging with C4.5 and AdaBoost with C4.5 using 5, 10, 50 and 90% of the full set for training. Perform enough experiments to obtain reliable estimates of the average test errors and variances of errors and interpret the results. Which classifier would you recommend if you had little / a lot of training data? (Hint: You can use the button Split Now to create a new random split of your points into training and test set.)

c)
Create a dataset with 2000 red and 2000 blue points from two overlapping normal distributions, similar to Figure 4 c). Use 50% of the data for training and compare C4.5, Bagging with C4.5 and AdaBoost with C4.5 on this dataset. Compute the average error on different training/test set splits and make screen shots of the decision boundaries. Explain why bagging performs better than boosting on this dataset.

d)
Create your own datasets where you can observe interesting effects by the use of bagging or boosting. Compare the performance of the ensemble methods and the weak learners on your datasets and visualize the decision surfaces.

Next: Topic models [3* P] Up: MLA_Exercises_2009 Previous: Bonus example: Moral graph
Haeusler Stefan 2010-01-26