Next: Estimating Body Fat
Previous: EM with Mixtures of
Download the latest UCI package of datasets for WEKA at http://prdownloads.sourceforge.net/weka/uci-20050214.tar.gz. For this project you will work with the multi-feature digit dataset, which can be found in the files mfeat-factors.arff, mfeat-fourier.arff, mfeat-karhunen.arff, mfeat-morphological.arff, mfeat-pixel.arff and mfeat-zernike.arff in the folder nominal/. This dataset describes 6 different sets of features for recognizing handwritten digits (0-9). There are 200 patterns per class and 649 available features for each pattern, distributed over 6 files. See the description in the .arff files for more information.
A good idea for achieving a good classification performance is to combine the features stored in different files. The vast number of features, however makes it difficult for most classifiers. Apply feature selection methods to reduce the dimensionality of the problem and achieve the best possible performance. Compare various classifiers on datasets of varying dimensionality with different parameters to find out which algorithm is best suited for this problem. Describe exactly what kind of analysis and preprocessing you performed and clearly write down your results.