Monolingual and crosslingual plagiarism detection

===================================================================
                Joint Seminar
===================================================================
The Hong Kong University of Science & Technology
Human Language Technology Center
Department of Computer Science and Engineering
-------------------------------------------------------------------

Speaker:	Professor Paolo Rosso
		Technical University of Valencia, Spain

Title:		"Monolingual and crosslingual plagiarism detection"

Date:		Thursday, 28 July 2011

Time:		4:00pm - 5:00pm

Venue:		Rm2578 (Annex, via lift 29/30), HKUST


Abstract:

Due to the amount of information available on the WWW and its ease of access,
during the last years the cases of plagiarism increased. A countermeasure to
such phenomenon has been the development of plagiarism detection tools.
Unfortunately, state-of-the-art plagiarism detection systems cannot easily
detect plagiarism in case of high level paraphrasing or translation. Detecting
cases of translated plagiarism is still in its infancy and just few are the
crosslingual plagiarism detection approaches that have been investigated so far.
The estimation of how similar two texts written in different languages are,
could be carried out on the basis of a comparable data set such as Wikipedia
(cross-language explicit semantic analysis) or through a statistical machine
translation approach (cross-language alignment-based similarity analysis) in
order to determine the likelihood of two text fragments of being valid
translations of each other. In this talk an overview of plagiarism detection
techniques will be given. Special emphasis will be given at crosslingual
plagiarism detection. These techniques could be potentially adapted for
English-Chinese plagiarism detection.


*****************************
Biography:

Paolo Rosso (http://www.dsic.upv.es/~prosso/ ) received the Ph.D. degree
in computer science from the Trinity College Dublin, University of
Ireland, in 1999. He is currently an associate professor with the Technical
University of Valencia, Spain where he leads the Natural Language Engineering
Lab of the ELiRF research group. He is co-author of over 200 papers published
in international conferences and journal.
His main research interests include topics related to natural language processing
and information retrieval: plagiarism detection, opinion mining and irony
detection, toponym disambiguation, and text categorisation, among others. He
actively participated in 17 national and international research projects (in 6
as PI). He has co-organised tracks at CLEF on Question Answering on Speech
Transcripts and Plagiarism Detection (sponsored by Yahoo! Research ):
http://pan.webis.de/