next up previous
Next: Improving Performance by Feature Up: MLA_Exercises_160106 Previous: EM Algorithm for Mixtures

Unrelated Features [3 P]

In this exercise you will explore how classifiers perform when they are given additional features that consist of pure noise. Start with the dataset diabetes.arff, which you have already used for exercise 1. Create two datasets and add one resp. two meaningless features. To do this apply the preprocessing filter Add to add new nominal features with three possible nominal values each (Important: set the attributeIndex to 9 and 10 respectively, otherwise WEKA will take your new feature as the new class label). Then apply the AddNoise filter to the new features (set percent to 100 and useMissing to true, use different random seeds for the attributes) to create pure random data. Train the same classifiers that you used for exercise 1 on a 66% percentage split of the training set on both datasets. Plot how the performance of the classifiers decreases when you have 0, 1 or 2 unrelated features (you can use your results from exercise 1 for the first case). Explain why this happens. Which classifier is most robust to useless features?

Hint: You can also use the WEKA Experimenter to run multiple classifiers on multiple datasets.


next up previous
Next: Improving Performance by Feature Up: MLA_Exercises_160106 Previous: EM Algorithm for Mixtures
Pfeiffer Michael 2006-01-18