next up previous
Next: Designing Bayesian Networks 1 Up: MLA_Exercises_160106 Previous: Unrelated Features [3 P]

Improving Performance by Feature Selection [4+1* P]

Use the 24-dimensional dataset squash-stored.arff from the WEKA agricultural datasets3 to investigate the effect of feature selection on classification performance. First classify the unmodified datasets with DecisionStump, J48, AdaBoostM1 (all with default parameters) and IBk (with $ k=3$ ). Note their classification error with leave-one-out cross validation (i.e. number of folds = number of examples).
Now use the Select Attributes tool of WEKA to reduce the dimensionality of the dataset. Use a single variable evaluator (OneRAttributeEval) and the Ranker search method to select the 5 best attributes. Remove all other attributes and repeat the classification experiment. Analyze the results and explain why some classifiers are more affected by feature selection than others.
a*)
How are the algorithms performing if you remove more features?

Hint: After running the attribute selection algorithm you can right-click on the last entry in the Result list to visualize the reduced dataset. You can then save the reduced dataset under a new filename. You can also use the WEKA Experimenter to run multiple classifiers on multiple datasets.


next up previous
Next: Designing Bayesian Networks 1 Up: MLA_Exercises_160106 Previous: Unrelated Features [3 P]
Pfeiffer Michael 2006-01-18