MPhil Thesis Defence "Mining User Preference using Spy Voting for Search Engine Personalization" By Mr. Lin Deng Abstract The World Wide Web (the Web) is serving an increasingly large and diversified user community. The diversity of user interests makes it difficult for a general Web search engine to meet the needs of an individual user. This thesis addresses the problem of Web search engine personalization. The main objectives of studying the personalization are to understand a user's preference and to provide the searched information that satisfies that preference. We present a new approach that mines users' preferences on the search results from clickthrough data and adapts the search engine's ranking function to improve search quality. Existing preference mining algorithms are typically based on strong assumptions on how users scan the search results. Thus, the preferences derived are often incorrect. In this thesis, we develop a new preference mining technique called SpyNB, which is based on a more reasonable assumption that the search results clicked by a user reflect the user's preference, but it does not make any conclusions about those that the user did not click. As such, SpyNB is still valid even if the user does not follow any order in reading the search results or has not clicked on all relevant results. We develop a spying process to infer the negative examples by first treating the result items clicked by the users as sure positive examples and those not clicked by the users as unlabelled data. Then, we plant the sure positive examples (the spies) into the unlabelled set of result items and then apply Naive Bayes classification to generate the reliable negative examples (thus the name "SpyNB"). These positive and negative examples allow us to discover highly accurate user preferences. Finally, we employ a ranking SVM to build a metasearch engine optimizer. The optimizer gradually adapts our metasearch engine according to the user's preference. In order to verify the effectiveness of SpyNB for preference mining, we conduct both offline and online experiments. Our extensive offline experiments demonstrate that SpyNB discovers much more accurate preferences than the existing algorithms. Moreover, the adaptive ranking function derived from SpyNB improves retrieval quality by 20% compared to the case without learning. The interactive online experiments further confirm that SpyNB and our personalization approach are effective in practice. We also show that the efficiency of SpyNB is comparable to the existing simple preference mining algorithms. Date: Wednesday, 18 January 2006 Time: 2:30p.m.-4:30p.m. Venue: Room 4480 Lifts 25-26 Committee Members: Prof. Dik-Lun Lee (Supervisor) Dr. Wilfred Ng (Supervisor) Dr. Nevin Zhang (Chairperson) Dr. Lei Chen **** ALL are Welcome ****