[Points: 12.5; Issued: 2007/05/08; Deadline: 2007/06/12; Tutor:
Ilir Ademi; Infohour: 2007/06/08, 15:15-16:15,
HSi11; Einsichtnahme: 2007/06/22, 15:15-16:15, HSi11; Download:
This homework assignment asks you to compare the performance of
three learning algorithms on different data sets by using the WEKA
two data sets from the archive contained in the file datasets.zip. Compare the following four
|Instance based learning
|Support vector machines
Use the pruned (set unpruned to false) version of the
decision trees and at least 3 different kernels for the support
Use the breast-cancer.arff dataset from the datasets.zip file. Apply the J48 algorithms for
decision trees with various settings to the dataset. As evaluation
method use a 10 fold cross-validation.
- Choose an appropriate method to compare the algorithms.
- Explain what you did so that the results can be reproduced by
- Present your results clearly, structured and legibly.
- State for each data set which learning algorithm you would
recommend and explain why. Note: Consider not only the error on the
test set, but also criteria as for instance the time for learning,
interpretability of the hypothesis etc.
- Apply the J48 algorithms with different values (1 to 9, you can
use a stepsize of 2) of the parameter minNumObj (specifies
the minimum number of examples contained in a leaf), use the
unpruned version of the decision trees (set the parameter unpruned
- Create a plot which shows the error on the training set and the
cross-validation error for the different values of
minNumObj. Also create a plot which shows the size of the
tree in dependence of minNumObj. Interpret your
- How do the results change when using pruned decision trees (set
unpruned to false)? Interpret the result, compare the size of the
pruned and unpruned trees.
- See the links section on the CI homepage for further
information and tutorials about WEKA.