Scalable, Low-Latency Data Analytics and its Applications

----------------------------------------------------------------
       *** A Talk in Memory of Professor Hongjun LU ***
----------------------------------------------------------------
Speaker:        Dr. Yanlei DIAO
                Department of Computer Science
                University of Massachusetts Amherst

Title:          "Scalable, Low-Latency Data Analytics and its
                Applications"

Date:           Wednesday, 18 January 2012

Time:           11:00 am - 12:00 noon

Venue:          Room 2463 (via lifts 25/26), HKUST

Abstract:

An integral part of many data-intensive applications is the need to
collect and analyze enormous data sets, such as click streams, search
logs, and sensor streams to derive answers and insights with low
latencies. Concurrently, new programming models and architectures have
been developed for large-scale cluster computing, exemplified by recent
MapReduce systems. However, these systems are designed for batch
processing and require data set to be fully loaded into the cluster before
running analytical queries, hence causing high delays of query answers.

In this talk, I present the design of a scalable, low-latency analytics
platform, called Scalla, that fundamentally transforms the existing
cluster computing paradigm into an incremental parallel processing
paradigm, which provides the combined benefits of massive parallelism,
incremental answers, and I/O efficiency. Our technical contributions
include replacing an existing popular mechanism for partitioned
parallelism with a purely hash-based mechanism and using dynamic frequency
analysis to offer in-memory processing for most of the data. In this talk,
I will also examine two application scenarios, click stream analysis,
which has been used in our evaluation, and genomic data analysis, which is
part of a large initiative on cloud services for massive-scale genomic
data processing and deep analysis.


**********************
Biography:

Yanlei Diao is an Associate Professor of Computer Science at the
University of Massachusetts Amherst. Her research interests are in
information architectures and data management systems, with a focus on
large-scale data analysis, data streams, uncertain data management, and
flash memory databases. She received her PhD in Computer Science from the
University of California, Berkeley in 2005, her M.S. in Computer Science
from the Hong Kong University of Science and Technology in 2000, and her
B.S. in Computer Science from Fudan University in 1998.

Yanlei Diao was a recipient of the NSF Career Award and the IBM Scalable
Innovation Faculty Award, and was a finalist of the Microsoft Research New
Faculty Fellowship. She spoke at the Distinguished Faculty Lecture Series
at the University of Texas at Austin. Her PhD dissertation "Query
Processing for Large-Scale XML Message Brokering" won the 2006 ACM-SIGMOD
Dissertation Award Honorable Mention. She is an associate editor of PVLDB
2013 and has served on the organizing committees of SIGMOD, CIDR, DMSN,
the New Researcher Symposium, and the New England Database Summit. She has
served on program committees of numerous international conferences and
workshops.