PhD Thesis Proposal Defence "Scaling Up Support Vector Machines" by Mr. Wai Hung Tsang Abstract: Kernel-based systems have shown to be very competitive on various machine learning tasks, such as classification, regression, clustering, ranking and principal component analysis. A well-known example in kernel methods is the support vector machine (SVM), which is firmly based on statistical learning theory and also has superior generalization performance in practice. However, standard SVM training has O(m3) time and O(m2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. In this proposal, by observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, I scale up kernel methods by exploiting such "approximateness". I first show that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry. Then, by adopting an efficient approximate MEB algorithm, I obtain provably approximately optimal solutions with the idea of core-sets. Our proposed Core Vector Machine (CVM) algorithm can be used with nonlinear kernels and has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large-scale real-world data sets demonstrate that the CVM is as accurate as existing SVM implementations, but is much faster and can handle much larger data sets than existing scale-up methods. For example, CVM with the Gaussian kernel produces superior results on the KDDCUP-99 intrusion detection data, which has about five million training patterns, in only 1.4 seconds on a 3.2GHz Pentium--4 PC. Date: Wednesday, 14 March 2007 Time: 2:00p.m.-4:00p.m. Venue: Room 3311 lifts 17-18 Committee Members: Dr. James Kwok (Supervisor) Dr. Sunil Arya (Chairperson) Dr. Brian Mak Dr. Dit Yan Yeung **** ALL are Welcome ****