Synthesizing Images and Videos from Large-scale Datasets

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

PhD Thesis Defence

Title: "Synthesizing Images and Videos from Large-scale Datasets"


Miss Mingming HE


The availability of large-scale visual data is increasingly inspiring 
sophisticated algorithms to process, understand and augment these 
resources. Particularly, with the rapid advancement of latest data-driven 
techniques, researchers have demonstrated exciting progress on a wide 
range of visual synthesis applications, drawing closer to the day that 
high-quality visual creation techniques are accessible to non-expert 
users. However, due to the lack of specific domain knowledge, the variety 
of target subjects, and the complexity of human perception, a majority of 
visual synthesis problems still remain challenging. In this dissertation, 
we focus on the algorithms for synthesizing both image color effects and 
video motion behaviors, to help create context-consistent and 
photorealistic visual content by leveraging the presence of large-scale 
visual data.

First, we propose an image algorithm to transfer photo color style from 
one image to another based on semantically meaningful dense 
correspondence. To achieve accurate color transfer results that respect 
the semantic relationship between image content, our algorithm leverages 
the features learned by a deep neural network to build the dense 
correspondence. Meanwhile, it optimizes local linear color models to 
enforce both local and global consistency. Semantic matching and color 
models are jointly optimized in a coarse-to-fine manner. This approach is 
further extended from "one-to-one" to "one-to-many" color transfer to 
boost the matching reliability by introducing more reference candidates.

However, for exemplar-based color synthesis applications including color 
transfer and colorization, it is still challenging to handle image pairs 
involving unrelated contents. The above "one-to-many" method is not a 
practical solution. Therefore, we take advantage of deep neural networks 
to better predict consistent chrominance across the whole image, including 
those mismatching elements, to achieve robust single-reference image 
colorization. Specifically, rather than using handcrafted rules as in 
traditional exemplar-based methods, we design an end-to-end colorization 
network which learns how to select, propagate, and predict colors from the 
large-scale dataset. This network generalizes well even when using 
reference images that are unrelated to the input grayscale image.

Finally, besides synthesizing static images, we also explore video 
synthesis techniques by processing large-scale captures and manipulating 
their dynamism. We present an approach to create wide-angle, 
high-resolution looping panoramic videos. Starting with hundreds of 
registered videos acquired on a robotic mount, we formulate a 
combinatorial optimization to determine for each output pixel the source 
video and looping parameters that jointly maximize spatiotemporal 
consistency. Optimizing such large size of video data is challenging. We 
accelerate the optimization by reducing the set of source labels using a 
graph-coloring scheme, parallelizing the computation and implementing it 
out-of-core. These techniques are combined to create gigapixel-sized 
looping panoramas.

Date:			Thursday, 1 November 2018

Time:			3:00pm - 5:00pm

Venue:			Room 3494
 			Lifts 25/26

Chairman:		Prof. Bing-Yi Jing (MATH)

Committee Members:	Prof. Pedro Sander (Supervisor)
 			Prof. Huamin Qu
 			Prof. Chiew-Lan Tai
 			Prof. Ajay Joneja (ISD)
 			Prof. Tien-Tsin Wong (CUHK)

**** ALL are Welcome ****