Large-scale Data Mining and its Applications to Information Retrieval

Speaker:	Professor Edward Chang
		Google Research China

Title:		"Large-scale Data Mining and its Applications to
		Information Retrieval"

Date:		Monday, 2 November 2009

Time:		10:30am - 11:30am

Venue:		Room 2404 (via lifts 17/18), HKUST

Abstract:

Confucius is a great teacher in ancient China. His theories and principles
were effectively spread throughout China by his disciples.  Confucius is
the product code name of Google's Knowledge Search product, which is built
at Google Beijing lab by my team.  In this talk, I present Knowledge
Search's key disciples, which are data management subroutines that
generate labels for questions, that match existing answers to a question,
that evaluate quality of answers, that rank users based on their
contributions, that distill high-quality answers for search engines to
index, etc.  This talk presents scalable algorithms that we have developed
to make these disciples effective in dealing with huge datasets. Efforts
in making these algorithms run even faster on thousands of machines, and
some open research problems will also be presented.


******************
Biography:

Edward Chang heads Google Research in China since March 2006.  He joined
the department of Electrical & Computer Engineering at University of
California, Santa Barbara, in 1999 after receiving his PhD from Stanford
University. Ed received his tenure in 2003, and was promoted to full
professor of Electrical Engineering in 2006. His recent research
activities are in the areas of distributed data mining and their
applications to rich-media data management and social-network
collaborative filtering. His research group (which consists of members
from Google, UC, MIT, Tsinghua, PKU, and Zheda) recently parallelized SVMs
(NIPS 07), PLSA (KDD 08), Association Mining (ACM RS 08), Spectral
Clustering (ECML 08), and LDA (WWW 09) (see MMDS/CIVR keynote slides for
details) to run on thousands of machines for mining large-scale datasets.
Ed has served on ACM (SIGMOD, KDD, MM, CIKM), VLDB, IEEE, WWW, and SIAM
conference program committees, and co-chaired several conferences
including MMM, ACM MM, ICDE, and WWW. Ed is a recipient of the IBM Faculty
Partnership Award and the NSF Career Award.