Implement an algorithm for learning a naive Bayes classifier and apply it to a spam email data set. You are required to use MATLAB for this assignment. The spam dataset is available for download on the course homepage^{1}.

- a)
- [1 P]
Write a function called

`nbayes_learn.m`that takes a training dataset for a binary classification task with binary attributes and returns the posterior Beta distributions of all model parameters (specified by variables and for the th model parameter) of a naive Bayes classifier given a prior Beta distribution for each of the model parameters (specified by variables and for the th model parameter). - b)
- [1 P]
Write a function called

`nbayes_predict.m`that takes a set of test data vectors and returns the most likely class label predictions for each input vector based on the posterior parameter distributions obtained in a). - c)
- [2 P]
Use both functions to conduct the following experiment. For your assignment you will be working with a data set that was created a few years ago at the Hewlett Packard Research Labs as a testbed data set to test different spam email classification algorithms.

- Verify the naive Bayes assumption for all pairs of input attributes.
- Train a naive Bayes model on the first 2500 samples (using Laplace uniform prior distributions) and report the classification error of the trained model on a test data set consisting of the remaining examples that were not used for training.
- Repeat the previous step, now training on the first {10, 50, 100, 200, ... , 500} samples, and again testing on the
same test data as used in point 1 (samples 2501 through 4601). Report the classification error on the test dataset as a function of the number of training examples. Hand in a plot of this function.
- Comment on how accurate the classifier would be if it would randomly guess a class label or it would always pick the most common label in the training data. Compare these performance values to the results obtained for the naive Bayes model.

- Verify the naive Bayes assumption for all pairs of input attributes.
- d)
- [2* P]
Train a feedforward neural network with one sigmoidal output unit and no hidden units with backpropagation (use the algorithm
`traingdx`and initialize the network with small but nonzero weights) on the first {10, 50, 100, 200, ... , 500} samples and test on the same test data as used in point 1 (samples 2501 through 4601). Report the classification error on the test dataset as a function of the number of training examples and compare the results to the one obtained for the naive Bayes classifier. Hand in a plot of this function.