The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Effective Topic Detection over Social Media"
Mr. Konstantinos GIANNAKOPOULOS
Nowadays, Social Networks (SNs) like Facebook and Twitter are very popular.
Thousands of users post tweets every day. In this dissertation, we are dealing
with three common issues of processing tweets. Firstly, we filter out the most
significant messages of a corpus of tweets, so that we can clear our dataset
from noise and extract information from important only messages. Secondly, we
propose a topic detection model that incorporates time and location. Thirdly,
we propose a novel tweet recommendation framework that is simple and stable.
Concerning filtering of tweets, we propose a method for classifying tweet
messages into two classes: informative and non-informative. We consider
informative messages those that contain information that interest the public,
trends, events and news. Non-informative tweets are personal messages that do
not interest the public, like conversations between friends, feelings and
description of mood. The motivation of our work is keeping informative tweets
that contain essential information, and filtering out useless tweets. Real
applications that can benefit from our work are trend/topic detection
applications, recommendation systems and applications that make predictions
based on user messages on social media.
Challenges of processing tweet messages is that they are short messages,
unstructured with unclear topic. We propose a weighted variation of the binary
multinomial naive Bayes’ model to identify informative messages. We train our
classifier and we evaluate results using 5-fold and 10-fold cross validation.
We compare the results with the original binary multinomial naive Bayes’ model.
We use two independent datasets of tweet messages crawled from the web. We
evaluate and present our results using the following metrics: accuracy, recall,
specificity, F-measure with its variations (F2 score and F0.5 score).
Concerning topic detection, the existing solutions overlook time and location
factors, which are quite important and useful. Moreover, social media are
frequently updated. Thus, the proposed detection model should handle the
dynamic updates. We introduce a topic model for topic detection that combines
time and location. Our model is equipped with incremental estimation of the
parameters of the topic model and adaptive window length according to the
correlation of consecutive windows and their density. We have conducted
extensive experiments to verify the effectiveness and efficiency of our
proposed Incremental Adaptive Time Location (IncrAdapTL) model.
Concerning tweet recommendation, twitter users post messages according to their
interests and read tweets of their friends. However, reading tweets in relevant
topics from more users may help them to broaden their perspective in their
interests. Topics combined with time and location are more useful. For
instance, someone during day-time is working downtown at a finance corporation
and during night-time lives with family at another district. This user is
interested to read, during working hours, tweets relevant with finance or
related to downtown, but not tweets related with entertainment. After work,
this user is interested in tweets related to family or entertainment and maybe
not tweets relevant to nightlife.
Our proposed tweet recommendation model consists of three parts: Firstly, we
model users’ preferences by using their previously posted tweets, location and
time. Secondly, we model tweet documents by proposing topic enchanced document
vectors. Thirdly, we train our model and we suggest tweets to users. Our
approach offers time efficient update handling without re-training our model,
and tackles the sparsity problem of (user,tweet) pairs. We evaluate our model
on approximately 1 million real tweets from Hong Kong, and we show that its
performance is stable.
Date: Thursday, 13 December 2018
Time: 4:00pm - 6:00pm
Venue: Room 2131C
Chairman: Prof. Chi-Ying Tsui (ECE)
Committee Members: Prof. Lei Chen (Supervisor)
Prof. Bo Li
Prof. Ke Yi
Prof. Ping Gao (CBE)
Prof. Yunjun Gao (Zhejiang University)
**** ALL are Welcome ****