COMP221: Programming Assignment 2

Due Nov 26, 2007 11:59pm

 

 

Consider the German Credit Data Base under the UCI Machine Learning Data Repositories.  A description of the data is given here.  In this data set, the last attribute decides whether a customer is granted loan or not.  This will be taken as the class attribute. 

Project Detail:

  1. Build training/testing data sets.  Split the data so that the first 700 records are the training data and the last 300 records are testing data. 
  2. On the new training dataset, use Weka to build three classifiers: (1) decision tree (2) Naive Bayesian and (3) KNN when K=3.  For decision tree classifiers, the number of instances per leaf node is set to be (m=3).
  3. Build a graph to compare the performance of the three classifiers: the x-axis is the number of training data used with an interval of 100, and the y axis is the accuracy of the corresponding method.

Give a brief 1 page description of your work: what conclusion do you draw, and why.