MACHINE RECOGNITION OF MUSIC EMOTION AND THE CORRELATION WITH MUSICAL TIMBRE

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "MACHINE RECOGNITION OF MUSIC EMOTION AND THE CORRELATION WITH MUSICAL 
TIMBRE"

By

Mr. Bin WU


Abstract

Music is one of the primary triggers of emotion. Listeners perceive strong 
emotions in music, and composers can create emotion-driven music. Researchers 
have given more and more attention to this area because of the many 
applications such as emotion-based music searching and automatic soundtrack 
matching. These applications have motivated research on the correlation between 
music features such as timbre and emotion perception. Machine recognition 
methods for music emotion have also been developed for automatically ective 
musical content so that it can be indexed and retrieved in large scale based on 
emotion.

In this research, our goal is to enable machine to automatically recognize 
music emotion. Therefore, we focus on two major topics: 1) understand the 
correlation between music emotion and timbre, 2) design algorithms for 
automatic music emotion recognition.

To understand the correlation between music emotion and timbre, we designed 
listening tests to compare sounds from eight wind and bowed string instruments. 
We wanted to know if some sounds were consistently perceived as being happier 
or sadder in pairwise comparisons, and which spectral features were most 
important aside from spectral centroid. Therefore, we conducted listening tests 
of normal sounds, centroid-equalized sounds, as well as static sounds. Our 
results showed strong emotional predispositions for each instrument, and that 
the even/odd harmonic ratio is perhaps the most salient timbral feature after 
attack time and brightness.

To design algorithms for automatic music emotion recognition, we investigated 
music emotion's properties. We found that the major problem of automatic music 
emotion recognition is lack-of-data, which is due to 1) music emotion is 
genre-specic, therefore labeled data for each music category is sparse; 2) 
music emotion is time-varying, and there is little time-varying labels for 
music emotion. Therefore, in this research, we have exploited unlabeled and 
social tagging data to alleviate problem 1). For problem 2), we have proposed 
to exploit time-sync comments data with a novel temporal and personalized topic 
model, and to exploit lyrics with a novel hierarchical Bayesian model.


Date:  			Thursday, 10 September 2015

Time:			2:00pm - 4:00pm

Venue:			Room 2304
 			Lifts 17/18

Chairman:		Prof. Min Yan (MATH)

Committee Members:	Prof. Andrew Horner (Supervisor)
 			Prof. Xiaojuan Ma
 			Prof. Qiang Yang
 			Prof. Richard So (IELM)
 			Prof. Kin-Hong Wong (CSE, CUHK)
 			Prof. Yi-Hsuan Yang (Academia Sinica)


**** ALL are Welcome ****