General Information

I am an Assistant Professor in the Department of Computer Science and Engineering of The Hong Kong University of Science and Technology (HKUST). I specialize in Computer Graphics, with emphasis on geometry processing, real-time rendering, and graphics hardware.

I am a citizen of Niterói, Rio de Janeiro, Brazil, and lived in Brasília for most of my early life. I went to college at Stony Brook University (1994-1998), and received my Masters and PhD degrees from Harvard University (1998-2003). Prior to joining HKUST, I was a senior member of the Application Research Group of ATI Research (2003-2006).

Contact

Email:
psander 'the at symbol' cse.ust.hk

Mailing address:
Department of Computer Science and Engineering
The Hong Kong University of Science and Technology
Clear Water Bay, Kowloon, Hong Kong
(no zip code)

Office:
Room 3525
(Academic Building, lifts 25-26)

University Information

Campus map/directions, VR tour, Academic calendar (dates), Google map, MTR
Computer Science postgraduate program information
Computer Science undergraduate program information
Are you an undergraduate interested in spending an exchange semester in Hong Kong?

Students Advised

Current

Past
FP - Following position

Service

Conferences

University

Publications

Refereed Academic Papers

We present a scheme for view-dependent level-of-detail control that is implemented entirely on programmable graphics hardware. Our scheme selectively refines and coarsens an arbitrary triangle mesh at the granularity of individual vertices, to create meshes that are highly adapted to dynamic view parameters. Such fine-grain control has previously been demonstrated using sequential CPU algorithms. However, these algorithms involve pointer-based structures with intricate dependencies that cannot be handled efficiently within the restricted framework of GPU parallelism. We show that by introducing new data structures and dependency rules, one can realize fine-grain progressive mesh updates as a sequence of parallel streaming passes over the mesh elements. A major design challenge is that the GPU processes stream elements in isolation. The mesh update algorithm has time complexity proportional to the selectively refined mesh, and moreover can be amortized across several frames. The static data structure is remarkably compact, requiring only 57% more memory than an indexed triangle list. We demonstrate real-time exploration of complex models with normals and textures.
Processing of mesh edges lies at the core of many advanced realtime rendering techniques, ranging from shadow and silhouette computations, to motion blur and fur rendering. We present a scheme for efficient traversal of mesh edges that builds on the adjacency primitives and programmable geometry shaders introduced in recent graphics hardware. Our scheme aims to minimize the number of primitives while maximizing SIMD parallelism. These objectives reduce to a set of discrete optimization problems on the dual graph of the mesh, and we develop practical solutions to these graph problems. In addition, we extend two existing vertex cache optimization algorithms to produce cache-efficient traversal orderings for adjacency primitives. We demonstrate significant runtime speedups for several practical real-time rendering algorithms.
We present a framework and supporting algorithms to automate the use of data reprojection as a general tool for optimizing procedural shaders. Although the general strategy of caching and reusing expensive intermediate shading calculations across consecutive frames has previously been shown to provide an effective trade-off between speed and accuracy, the critical choices of what to reuse and at what rate to refresh cached entries have been left to a designer. The fact that these decisions require a deep understanding of a procedure’ssemantic structure makes it challenging to select optimal candidates among possibly hundreds of alternatives. Our automated approach relies on parametric models of the way possible caching decisions affect the shader’s performance and visual fidelity. These models are trained using a sample rendering session and drive an interactive profiler in which the user can explore the error/performance trade-offs associated with incorporating temporal reprojection. We evaluate the proposed models and selection algorithm with a prototype system used to optimize several complex shaders and compare our approach to current alternatives.
This paper introduces a framebuffer level of detail algorithm for controlling the pixel workload in an interactive rendering application. Our basic strategy is to evaluate the shading in a low resolution buffer and, in a second rendering pass, resample this buffer at the desired screen resolution. The size of the lower resolution buffer provides a trade-off between rendering time and the level of detail in the final shading. In order to reduce approximation error we use a feature-preserving reconstruction technique that more faithfully approximates the shading near depth and normal discontinuities. We also demonstrate how intermediate components of the shading can be selectively resized to provide finer-grained control over resource allocation. Finally, we introduce a simple control mechanism that continuously adjusts the amount of resizing necessary to maintain a target framerate. These techniques do not require any preprocessing, are straightforward to implement on modern GPUs, and are shown to provide significant performance gains for several pixel-bound scenes.
Several recently proposed techniques based on the principle of data reprojection allow reusing shading information generated in one frame to accelerate the calculation of the shading in the following frame. This strategy can significantly reduce the average rendering cost for many important real-time effects at an acceptable level of approximation error. This paper analyzes the overhead associated with incorporating temporal data reprojection on modern GPUs. Based on this analysis, we propose an alternative algorithm to those previously described in the literature and measure its efficiency for multiple scenes and hardware platforms.
We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). The most recent GPU features include support for writing to random memory locations, efficient inter-processor communication, and a programming model for general-purpose computing. Taking advantage of these new features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts.
We present novel algorithms that optimize the order in which triangles are rendered, to improve post-transform vertex cache efficiency as well as for view-independent overdraw reduction. The resulting triangle orders perform on par with previous methods, but are orders magnitude faster to compute. The improvements in processing speed allow us to perform the optimization right after a model is loaded, when more information on the host hardware is available. This allows our vertex cache optimization to often outperform other methods. In fact, our algorithms can even be executed interactively, allowing for re-optimization in case of changes to geometry or topology, which happen often in CAD/CAM applications. We believe that most real-time rendering applications will immediately benefit from these new results.
Evaluating pixel shaders consumes a growing share of the computational budget for real-time applications. However, the significant temporal coherence in visible surface regions, lighting conditions, and camera location allows reusing computationally-intensive shading calculations between frames to achieve significant performance improvements at little degradation in visual quality. This paper investigates a caching scheme based on reverse reprojection which allows pixel shaders to store and reuse calculations performed at visible surface points. We provide guidelines to help programmers select appropriate values to cache and present several policies for keeping cached entries up-to-date. Our results confirm this approach offers substantial performance gains for many common real-time effects, including precomputed global lighting effects, stereoscopic rendering, motion blur, depth of field, and shadow mapping.
Recently, graphics processing units, or GPUs, have become a viable alternative as commodity, parallel hardware for general-purpose computing, due to their massive data-parallelism, high memory bandwidth, and improved general-purpose programming interface. In this paper, we explore the use of GPU on the grid file, a traditional multidimensional access method. Considering the hardware characteristics of GPUs, we design a massively multi-threaded GPU-based grid file for static, memory-resident multidimensional point data. Moreover, we propose a hierarchical grid file variant to handle data skews efficiently. Our implementations on the NVIDIA G80 GTX graphics card are able to achieve two to eight times? higher performance than their CPU counterparts on a single PC.
This paper introduces a new real-time shading model that uses spherical cap intersections to approximate a surface?s incident lighting from dynamic area light sources. Our method uses precomputed visibility information for static meshes to compute illumination with approximate highfrequency shadows in a single rendering pass. Because this technique relies on precomputed visibility data, the mesh is assumed to be static at render time. Due to its high efficiency and low memory footprint this method is highly suitable for games.

We describe an automatic preprocessing algorithm that reorders triangles in a mesh so as to enable the graphics hardware to efficiently cull vertex and pixel processing at rendering time. Our method starts by dividing the mesh into planar clusters which are subsequently sorted into a view-independent order which greatly reduces overdraw. The result is an increase in the opportunities for early Z-culling, reducing pixel processing time. The clusters are then optimized for mesh locality. This produces high rates of vertex cache hits, reducing vertex processing time. We have found that our method brings the overdraw rates of a wide range of models close to that of front-to-back order, while preserving state of the art vertex cache performance. This results in higher frame rates for pixel-bound applications with no penalty to vertex-bound applications.
We introduce a view-dependent level of detail rendering system designed with modern GPU architectures in mind. Our approach keeps the data in static buffers and geomorphs between different LODs using per-vertex weights for seamless transition. Our method is the first out-of-core system to support texture mapping, including a mechanism for texture LOD. This approach completely avoids LOD pops and boundary cracks while gracefully adapting to a specified framerate or level of detail. Our method is suitable for all classes of GPUs that provide basic vertex shader programmability, and is applicable for both out-of-core or instanced geometry. The contributions of our work include a preprocessing and rendering system for view-dependent LOD rendering by geomorphing static buffers using per-vertex weights, a vertex buffer tree to minimize the number of API draw calls when rendering coarse-level geometry, and automatic methods for efficient, transparent LOD control.
We propose a metric for surface parameterization specialized to its signal that can be used to create more efficient, high-quality texture maps. Derived from Taylor expansion of signal error, our metric predicts the signal approximation error - the difference between the original surface signal and its reconstruction from the sampled texture. Unlike previous methods, our metric assumes piecewise-linear reconstruction, and thus makes a good approximation to bilinear reconstruction employed in graphics hardware. We achieve significant savings in texture area for a desired signal accuracy compared to the signal-specialized parameterization metric proposed by Sander et al. in the 2002 Eurographics Workshop on Rendering.
We present the “Geometry Video,” a new data structure to encode animated meshes. Being able to encode animated meshes in a generic source-independent format allows people to share experiences. Changing the viewpoint allows more interaction than the fixed view supported by 2D video. Geometry videos are based on the “Geometry Image” mesh representation introduced by Gu et al. Our novel data structure provides a way to treat an animated mesh as a video sequence (i.e., 3D image) and is well suited for network streaming. This representation also offers the possibility of applying and adapting existing mature video processing and compression techniques (such as MPEG encoding) to animated meshes. This paper describes an algorithm to generate geometry videos from animated meshes. The main insight of this paper, is that Geometry Videos re-sample and re-organize the geometry information, in such a way, that it becomes very compressible. They provide a unified and intuitive method for level-of-detail control, both in terms of mesh resolution (by scaling the two spatial dimensions) and of frame rate (by scaling the temporal dimension). Geometry Videos have a very uniform and regular structure. Their resource and computational requirements can be calculated exactly, hence making them also suitable for applications requiring level of service guarantees.
We introduce multi-chart geometry images, a new representation for arbitrary surfaces. It is created by resam-pling a surface onto a regular 2D grid. Whereas the original scheme of Gu et al. maps the entire surface onto a single square, we use an atlas construction to map the surface piecewise onto charts of arbitrary shape. We dem-onstrate that this added flexibility reduces parametrization distortion and thus provides greater geometric fidelity, particularly for shapes with long extremities, high genus, or disconnected components. Traditional atlas construc-tions suffer from discontinuous reconstruction across chart boundaries, which in our context create unacceptable surface cracks. Our solution is a novel zippering algorithm that creates a watertight surface. In addition, we pre-sent a new atlas chartification scheme based on clustering optimization.
Complex meshes tend to have intricate, detailed silhouettes. This paper proposes two algorithms for extracting a simpler, approximate silhouette from a high-resolution model. Our methods preserve the important features of the silhouette by using the silhouette of a coarser, simplified mesh as a guide. Our simple silhouettes have significantly fewer edges than the original silhouette, while still preserving its appearance.
To reduce memory requirements for texture mapping a model, we build a surface parametrization specialized to its signal (such as color or normal). Intuitively, we want to allocate more texture samples in regions with greater signal detail. Our approach is to minimize signal approximation error — the difference between the original surface signal and its reconstruction from the sampled texture. Specifically, our signal-stretch pa-rametrization metric is derived from a Taylor expansion of signal error. For fast evaluation, this metric is pre-integrated over the surface as a metric tensor. We minimize this nonlinear metric using a novel coarse-to-fine hierarchical solver, further accelerated with a fine-to-coarse propagation of the integrated metric tensor. Use of metric tensors permits anisotropic squashing of the parametrization along directions of low signal gra-dient. Texture area can often be reduced by a factor of 4 for a desired signal accuracy compared to non-specialized parametrizations.
Given an arbitrary mesh, we present a method to construct a progressive mesh (PM) such that all meshes in the PM sequence share a common texture parametrization. Our method considers two important goals simultaneously. It minimizes texture stretch (small texture distances mapped onto large surface distances) to balance sampling rates over all locations and directions on the surface. It also minimizes texture deviation (“slippage” error based on parametric correspondence) to obtain accurate textured mesh approximations. The method begins by partitioning the mesh into charts using planarity and compactness heuristics. It creates a stretch-minimizing parametrization within each chart, and resizes the charts based on the resulting stretch. Next, it simplifies the mesh while respecting the chart boundaries. The parametrization is re-optimized to reduce both stretch and devia-tion over the whole PM sequence. Finally, the charts are packed into a texture atlas. We demonstrate using such atlases to sample color and normal maps over several models.
Aliasing is an important problem when rendering triangle meshes. Efficient antialiasing techniques such as mipmapping greatly improve the filtering of textures defined over a mesh. A major component of the remaining aliasing occurs along discontinuity edges such as silhouettes, creases, and material boundaries. Framebuffer supersampling is a simple remedy, but 2?2 super-sampling leaves behind significant temporal artifacts, while greater supersampling demands even more fill-rate and memory. We present an alternative that focuses effort on discontinuity edges by overdrawing such edges as antialiased lines. Although the idea is simple, several subtleties arise. Visible silhouette edges must be detected efficiently. Discontinuity edges need consistent orientations. They must be blended as they approach the silhouette to avoid popping. Unfortunately, edge blending results in blurriness. Our technique balances these two competing objectives of temporal smoothness and spatial sharpness. Finally, the best results are obtained when discontinuity edges are sorted by depth. Our approach proves surprisingly effective at reducing temporal artifacts commonly referred to as "crawling jaggies", with little added cost.
Approximating detailed models with coarse, texture-mapped meshes results in polygonal silhouettes. To eliminate this artifact, we introduce silhouette clipping, a framework for efficiently clip-ping the rendering of coarse geometry to the exact silhouette of the original model. The coarse mesh is obtained using progressive hulls, a novel representation with the nesting property required for proper clipping. We describe an improved technique for construct-ing texture and normal maps over this coarse mesh. Given a per-spective view, silhouettes are efficiently extracted from the original mesh using a precomputed search tree. Within the tree, hierarchical culling is achieved using pairs of anchored cones. The extracted silhouette edges are used to set the hardware stencil buffer and al-pha buffer, which in turn clip and antialias the rendered coarse ge-ometry. Results demonstrate that silhouette clipping can produce renderings of similar quality to high-resolution meshes in less ren-dering time.

Conference Demos, Posters and Sketches

GPUQP: Query Co-Processing using Graphics Processors
R. Fang, B. He, M. Lu, K. Yang, N. K. Govindaraju, Q. Luo, P. V. Sander.
SIGMOD 2007 demo paper.

Real-Time Reprojection Cache
D. Nehab, P. V. Sander, J. Isidoro.
SIGGRAPH 2006 sketch.
Long version available in GH 2007 paper.

Compressing and Managing Large Datasets for The Real-Time Parthenon Demo
P. V. Sander, J. Isidoro
I3D 2006 poster.

Early-Z Culling for Efficient Fluid Flow Simulation
P. V. Sander, N. Tatarchuk, J. L. Mitchell
I3D 2005 poster.
Long version available in ShaderX5 book.

Real-Time Skin Rendering on Graphics Hardware
P. V. Sander, D. Gosselin, J. L. Mitchell.
SIGGRAPH 2004 sketch.
Long version available in ShaderX3 book.

Book Chapters

Early-Z Culling for Efficient GPU-Based Fluid Simulation
P. V. Sander, N. Tatarchuk, J. L. Mitchell.
ShaderX5: Advanced Rendering Techniques, Charles River Media, 2006.

Methods for Real-Time Skin Rendering
D. Gosselin, P. V. Sander, J. L. Mitchell.
ShaderX3: Advanced Rendering With DirectX And OpenGL, Charles River Media, 2004.

Drawing a Crowd
D. Gosselin, P. V. Sander, J. L. Mitchell.
ShaderX3: Advanced Rendering With DirectX And OpenGL, Charles River Media, 2004.

Thesis

Sampling-Efficient Mesh Parametrization
P. V. Sander.
Ph.D. Dissertation. Harvard University, May 2003.

Technical Demos

These demos were developed when I worked at ATI's Application Research Group. They include several rendering effects, some of which based on the techniques from my papers. The demos are copyright of AMD (2004-2006).