COMP221: Programming Assignment 2
Due Nov 26, 2007 11:59pm
Consider the German Credit Data Base under the UCI Machine Learning Data
Repositories. A description of the data is given here. In this data set, the last attribute decides whether
a customer is granted
loan or not. This will be taken as the class attribute.
Project Detail:
- Build training/testing data sets. Split the data so that the first
700 records are the training data and the last 300 records are testing data.
- On the new training dataset, use Weka to build three classifiers: (1)
decision tree (2) Naive Bayesian and (3) KNN when K=3. For decision
tree classifiers, the number of instances per leaf node is set to be (m=3).
- Build a graph to compare the performance of the three classifiers: the
x-axis is the number of training data used with an interval of 100, and the
y axis is the accuracy of the corresponding method.
Give a brief 1 page description of your work: what conclusion do you draw,
and why.