Effective Topic Detection over Social Media

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Effective Topic Detection over Social Media"

By

Mr. Konstantinos GIANNAKOPOULOS


Abstract

Nowadays, Social Networks (SNs) like Facebook and Twitter are very popular. 
Thousands of users post tweets every day. In this dissertation, we are dealing 
with three common issues of processing tweets. Firstly, we filter out the most 
significant messages of a corpus of tweets, so that we can clear our dataset 
from noise and extract information from important only messages. Secondly, we 
propose a topic detection model that incorporates time and location. Thirdly, 
we propose a novel tweet recommendation framework that is simple and stable.

Concerning filtering of tweets, we propose a method for classifying tweet 
messages into two classes: informative and non-informative. We consider 
informative messages those that contain information that interest the public, 
trends, events and news. Non-informative tweets are personal messages that do 
not interest the public, like conversations between friends, feelings and 
description of mood. The motivation of our work is keeping informative tweets 
that contain essential information, and filtering out useless tweets. Real 
applications that can benefit from our work are trend/topic detection 
applications, recommendation systems and applications that make predictions 
based on user messages on social media.

Challenges of processing tweet messages is that they are short messages, 
unstructured with unclear topic. We propose a weighted variation of the binary 
multinomial naive Bayes’ model to identify informative messages. We train our 
classifier and we evaluate results using 5-fold and 10-fold cross validation. 
We compare the results with the original binary multinomial naive Bayes’ model. 
We use two independent datasets of tweet messages crawled from the web. We 
evaluate and present our results using the following metrics: accuracy, recall, 
specificity, F-measure with its variations (F2 score and F0.5 score).

Concerning topic detection, the existing solutions overlook time and location 
factors, which are quite important and useful. Moreover, social media are 
frequently updated. Thus, the proposed detection model should handle the 
dynamic updates. We introduce a topic model for topic detection that combines 
time and location. Our model is equipped with incremental estimation of the 
parameters of the topic model and adaptive window length according to the 
correlation of consecutive windows and their density. We have conducted 
extensive experiments to verify the effectiveness and efficiency of our 
proposed Incremental Adaptive Time Location (IncrAdapTL) model.

Concerning tweet recommendation, twitter users post messages according to their 
interests and read tweets of their friends. However, reading tweets in relevant 
topics from more users may help them to broaden their perspective in their 
interests. Topics combined with time and location are more useful. For 
instance, someone during day-time is working downtown at a finance corporation 
and during night-time lives with family at another district. This user is 
interested to read, during working hours, tweets relevant with finance or 
related to downtown, but not tweets related with entertainment. After work, 
this user is interested in tweets related to family or entertainment and maybe 
not tweets relevant to nightlife.

Our proposed tweet recommendation model consists of three parts: Firstly, we 
model users’ preferences by using their previously posted tweets, location and 
time. Secondly, we model tweet documents by proposing topic enchanced document 
vectors. Thirdly, we train our model and we suggest tweets to users. Our 
approach offers time efficient update handling without re-training our model, 
and tackles the sparsity problem of (user,tweet) pairs. We evaluate our model 
on approximately 1 million real tweets from Hong Kong, and we show that its 
performance is stable.


Date:			Thursday, 13 December 2018

Time:			4:00pm - 6:00pm

Venue:			Room 2131C
 			Lift 19

Chairman:		Prof. Chi-Ying Tsui (ECE)

Committee Members:	Prof. Lei Chen (Supervisor)
 			Prof. Bo Li
 			Prof. Ke Yi
 			Prof. Ping Gao (CBE)
 			Prof. Yunjun Gao (Zhejiang University)


**** ALL are Welcome ****