Efficient Transactional Database Storage on Flash Solid State Drives

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Efficient Transactional Database Storage on Flash Solid State 
Drives"

By

Mr. Jun YANG


Abstract

Flash solid state drives (SSDs), or flash disks, are a type of persistent 
storage devices with the potential to replace magnetic disks. They outperform 
magnetic disks on access speed, bandwidth, shock resistance, and power 
efficiency. As their capacity increases and prices decrease, flash disks are 
considered for the storage of database systems. Due to the differences in flash 
SSDs and magnetic disks, traditional data management techniques designed for 
magnetic disks need to be re-examined for flash disks. In particular, the flash 
memory used in flash disks has an asymmetry between read and write speeds, 
where reads, no matter random or sequential, are much faster than writes.

This thesis studies the performance of transactional workloads on flash disks 
and designs efficient storage schemes for them. Specifically, we begin with the 
performance study of the TPC-C workload on flash SSDs. Overall, the flash SSDs 
outperform the magnetic disk by up to an order of magnitude. Moreover, the I/O 
performance of the SSDs is dominated by random writes, whereas that of the 
magnetic disk by random reads. Additionally, both minimising logging and 
adopting MVCC (Multi-Version Concurrency Control) than 2PL (Two-Phase Locking) 
helps improve the performance on flash SSDs.

Observing the dominance of random writes in flash SSDs under TPC-C workloads, 
we propose a new database storage layout, called Partitioned Logging (PTL). In 
PTL, we replace data writes with logging to eliminate random page writes, and 
put data and logs into separate blocks. Moreover, we group data blocks into 
partitions so that updates on each partition are appended as log entries to one 
log block. This way, we can tune the partition size to balance the read and 
write performance based on the hardware and workload characteristics. The 
results show a considerable improvement over both the traditional storage and a 
leading flash-based database storage scheme.

Finally, to solve the redundant I/O problem and eliminate merge operations that 
are essential in all the other log-structured approaches, we propose FlashTKV, 
which adopts a purely sequential storage format where all the data and 
transactional information are log records. Furthermore, we support MVCC on this 
sequential storage efficiently. Our results show that FlashTKV improves the 
transaction throughput by 70% over two well-known KV-stores under TPC-C 
workloads on flash SSDs.


Date:			Wednesday, 19 June 2013

Time:			1:00pm – 3:00pm

Venue:			Room 3501
 			Lifts 25/26

Chairman:		Prof. Ellick Wong (MGMT)

Committee Members:	Prof. Qiong Luo (Supervisor)
 			Prof. Frederick Lochovsky
 			Prof. Ke Yi
 			Prof. Zhiyong Fan (ECE)
                       	Prof. Jeffrey Yu (Sys. Engg. & Engg. Mgmt., CUHK)


**** ALL are Welcome ****