The Hong Kong University of Science and Technology Department of Computer Science PhD Thesis Defence "On the Use of Web Data Quality Dimensions for Finding High Quality Web Pages" By Mr. Chun Chung Pun Abstract While users can readily find information from the immense store of knowledge in the Web with the help of a search engine, they often complain about the quality of the results returned. The quality of a web page is not necessarily only limited to its relevance, as judged by most search engines, but can also depend on other features of a web page. This thesis first examines what factors constitute a high quality web page; hence a framework of web data quality dimensions is proposed. From this framework, it is found that current search engines only consider a very small number of web data quality dimensions. We then propose a general methodology for evaluating web data quality metrics derived from the web data quality dimensions. The methodology has been applied to two web data quality dimensions (that is, appropriateness and cohesiveness) as well as the combination of these two dimensions. Metrics were developed to measure each dimension from a web page. They have been verified to measure users' expectations. The web data quality dimension, appropriateness, measures how well the results returned from search engines satisfying the web genre needs of a user. It is based on the linguistic and visual complexity of a web page. The web data quality dimension, cohesiveness, is a measure of how closely the concepts within a web page are related to each other. A distance metric is defined to measure how close two concepts are in an ontology and the cohesiveness of a web page is calculated as the total distances of all the concepts within it. In addition, a technique to combine different quality metrics and to incorporate user's preference for each web data quality dimension is proposed. With these metrics, users can more easily find high quality web pages (i.e., not just relevant web pages, but also web pages matching other desired web data quality dimensions). A user evaluation has been conducted to show the conformance of the metrics to the corresponding web data quality dimensions (i.e., appropriateness, cohesiveness and combined dimensions). It also revealed that the judgments on these dimensions from users are fairly consistent. Date: Thursday, 19 January 2006 Time: 10:00a.m.-12:00noon Venue: Room 4480 Lifts 25-26 Chairman: Prof. Songnian Chen (ECON) Committee Members: Prof. Frederick Lochovsky (Supervisor) Prof. Wilfred Ng Prof. Vincent Shen Prof. Richard So (IELM) Prof. Carolyn Watters (Comp. Sci., Dalhousie Univ.) **** ALL are Welcome ****