B.S. Stony Brook, S.M. Harvard, Ph.D. Harvard
Associate Professor, CSE, HKUST
Research area: Computer Graphics (VISGRAPH group)
A 7-gigapixel panorama of HKUST. Hong Kong, June 2010.

General Information

I am a citizen of Niterói, Rio de Janeiro, Brazil, and lived in Brasília for most of my early life. I went to college at Stony Brook University (1994-1998), and received my Masters and PhD degrees from Harvard University (1998-2003). Prior to joining HKUST, I worked for the Application Research Group of ATI Research (2003-2006).

Contact

Email:
psander 'the at symbol' cse.ust.hk

Mailing address:
Department of Computer Science and Engineering
The Hong Kong University of Science and Technology
Clear Water Bay, Kowloon, Hong Kong
(no zip code)

Office:
Academic Building, Room 3525
(closest lifts: 25-26)

University Information

Campus map/directions
Google map
MTR map (subway)
Weather

Campus virtual tour

Academic calendar - Dates

CSE postgraduate program information
CSE undergraduate program information
CSE B.Sc. double-major program information
Undergraduate exchange program

Peeking from the right arm of Christ the Redeemer.
Rio de Janeiro, Brazil, June 2010.

My son Luca Sander at six weeks of age.
Hong Kong, March 2011.

Students Advised

Current

Past
FP - Following position

Service

Conferences/Journals

University

Publications

Refereed Academic Papers

We present a novel approach for real-time rendering of static 3D models front-to-back or back-to-front relative to any viewpoint outside its bounding volume. The approach renders depth-sorted triangles using a single draw-call. At run-time, we replace the traditional sorting strategy of existing algorithms with a faster triangle selection strategy. The selection process operates on an extended sequence of triangles annotated by test planes, created by our off-line preprocessing stage. Based on these test planes, a simple run-time procedure uses the given viewpoint to select a subsequence of triangles for rasterization. Selected subsequences are statically presorted by depth and contain each input triangle exactly once. Our method runs on legacy hardware and renders depth-sorted static models significantly faster than previous approaches. We conclude demonstrating the real-time rendering of order-independent transparency effects.
...
In order to achieve high-quality rendering at a lower cost, one can exploit temporal coherence (TC). The underlying observation is that a higher resolution and frame rate do not necessarily imply a much higher workload, but a larger amount of redundancy and a higher potential for amortizing rendering over several frames. In this STAR, we will investigate methods that make use of this principle and provide practical and theoretical advice on how to exploit temporal coherence for performance optimization. These methods not only allow us to incorporate more computationally intensive shading effects into many existing applications, but also offer exciting opportunities for extending high-end graphics applications to lower-spec consumer-level hardware.
...
We implemented a GPU-powered parallel k-centers algorithm to perform clustering on the conformations of molecular dynamics (MD) simulations. The algorithm is up to two orders of magnitude faster than the CPU implementation. We tested our algorithm on four protein MD simulation datasets ranging from the small Alanine Dipeptide to a 370-residue Maltose Binding Protein (MBP). It is capable of grouping 250,000 conformations of the MBP into 4000 clusters within 40 seconds. To achieve this, we effectively parallelized the code on the GPU and utilize the triangle inequality of metric spaces. Furthermore, the algorithm's running time is linear with respect to the number of cluster centers. In addition, we found the triangle inequality to be less effective in higher dimensions and provide a mathematical rationale. Finally, using Alanine Dipeptide as an example, we show a strong correlation between cluster populations resulting from the k-centers algorithm and the underlying density.
We introduce a method for increasing the framerate of real-time rendering applications. Whereas many existing temporal upsampling strategies only reuse information from previous frames, our bidirectional technique reconstructs intermediate frames from a pair of consecutive rendered frames. This significantly improves the accuracy and efficiency of data reuse since very few pixels are simultaneously occluded in both frames. We present two versions of this basic algorithm. The first is appropriate for fill-bound scenes as it limits the number of expensive shading calculations, but involves rasterization of scene geometry at each intermediate frame. The second version, our more significant contribution, reduces both shading and geometry computations by performing reprojection using only image-based buffers. It warps and combines the adjacent rendered frames using an efficieniterative search on their stored scene depth and flow. Bidirectional reprojection introduces a small amount of lag. We perform a user study to investigate this lag, and find that its effect is minor. We demonstrate substantial performance improvements (3-4x) for a variety of applications, including vertex-bound and fill-bound scenes, multi-pass effects, and motion blur.
We present a method for restoring antialiased edges that are damaged by certain types of nonlinear image filters. This problem arises with many common operations such as intensity thresholding, tone mapping, gamma correction, histogram equalization, bilateral filters, unsharp masking, and certain non-photorealistic filters. We present a simple algorithm that selectively adjusts the local gradients in affected regions of the filtered image so that they are consistent with those in the original image. Our algorithm is highly parallel and is therefore easily implemented on a GPU. Our prototype system can process up to 500 megapixels per second and we present results for a number of different image filters.
...
In order to achieve high-quality rendering at a lower cost, one can exploit temporal coherence (TC). The underlying observation is that a higher resolution and frame rate do not necessarily imply a much higher workload, but a larger amount of redundancy and a higher potential for amortizing rendering over several frames. In this STAR, we will investigate methods that make use of this principle and provide practical and theoretical advice on how to exploit temporal coherence for performance optimization. These methods not only allow us to incorporate more computationally intensive shading effects into many existing applications, but also offer exciting opportunities for extending high-end graphics applications to lower-spec consumer-level hardware.
...
Blue noise sampling is widely employed for a variety of imaging, geometry, and rendering applications. However, existing research so far has focused mainly on isotropic sampling, and challenges remain for the anisotropic scenario both in sample generation and quality verification. We present anisotropic blue noise sampling to address these issues. On the generation side, we extend dart throwing and relaxation, the two classical methods for isotropic blue noise sampling, for the anisotropic setting, while ensuring both high-quality results and efficient computation. On the verification side, although Fourier spectrum analysis has been one of the most powerful and widely adopted tools, so far it has been applied only to uniform isotropic samples. We introduce approaches based on warping and sphere sampling that allow us to extend Fourier spectrum analysis for adaptive and/or anisotropic samples; thus, we can detect problems in alternative anisotropic sampling techniques that were not yet found via prior verification. We present several applications of our technique, including stippling, visualization, surface texturing, and object distribution.
We introduce a real-time system that converts images, video, or 3D animation sequences to artistic renderings in various painterly styles. The algorithm, which is entirely executed on the GPU, can efficiently process 512^2 resolution frames containing 60,000 individual strokes at over 30 fps. In order to exploit the parallel nature of GPUs, our algorithm determines the placement of strokes entirely from local pixel neighborhood information. The strokes are rendered as point sprites with textures. Temporal coherence is achieved by treating the brush strokes as particles and moving them based on optical flow. Our system renders high quality results while allowing the user interactive control over many stylistic parameters such as stroke size, texture and density.
We present a real-time rendering scheme that reuses shading samples from earlier time frames to achieve practical antialiasing of procedural shaders. Using a reprojection strategy, we maintain several sets of shading estimates at subpixel precision, and incrementally update these such that for most pixels only one new shaded sample is evaluated per frame. The key difficulty is to prevent accumulated blurring during successive reprojections. We present a theoretical analysis of the blur introduced by reprojection methods. Based on this analysis, we introduce a nonuniform spatial filter, an adaptive recursive temporal filter, and a principled scheme for locally estimating the spatial blur. Our scheme is appropriate for antialiasing shading attributes that vary slowly over time. It works in a single rendering pass on commodity graphics hardware, and offers results that surpass 4x4 stratified supersampling in quality, at a fraction of the cost.
In this article, we present our design, implementation, and evaluation of an in-memory relational query coprocessing system, GDB, on the GPU. Taking advantage of the GPU hardware features, we design a set of highly optimized data-parallel primitives such as split and sort, and use these primitives to implement common relational query processing algorithms. Our algorithms utilize the high parallelism as well as the high memory bandwidth of theGPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. Furthermore, we propose coprocessing techniques that take into account both the computation resources and the GPU-CPU data transfer cost so that each operator in a query can utilize suitable processors—the CPU, the GPU, or both—for an optimized overall performance. We have evaluated our GDB system on a machine with an Intel quad-core CPU and an NVIDIA GeForce 8800 GTX GPU. Our workloads include microbenchmark queries on memory-resident data as well as TPC-H queries that involve complex data types and multiple query operators on data sets larger than the GPU memory. Our results show that our GPU-based algorithms are 2–27x faster than their optimized CPU-based counterparts on in-memory data. Moreover, the performance of our coprocessing scheme is similar to, or better than, both the GPU-only and the CPU-only schemes.
We present a scheme for view-dependent level-of-detail control that is implemented entirely on programmable graphics hardware. Our scheme selectively refines and coarsens an arbitrary triangle mesh at the granularity of individual vertices to create meshes that are highly adapted to dynamic view parameters. Such fine-grain control has previously been demonstrated using sequential CPU algorithms. However, these algorithms involve pointer-based structures with intricate dependencies that cannot be handled efficiently within the restricted framework of GPU parallelism.We show that by introducing new data structures and dependency rules, one can realize fine-grain progressive mesh updates as a sequence of parallel streaming passes over the mesh elements. A major design challenge is that the GPU processes stream elements in isolation. The mesh update algorithm has time complexity proportional to the selectively refined mesh, and moreover can be amortized across several frames. The result is a single standard index buffer than can be used directly for rendering. The static data structure is remarkably compact, requiring only 57% more memory than an indexed triangle list. We demonstrate real-time exploration of complex models with normals and textures, as well as shadowing and semitransparent surface rendering applications that make direct use of the resulting dynamic index buffer.
We present a scheme for view-dependent level-of-detail control that is implemented entirely on programmable graphics hardware. Our scheme selectively refines and coarsens an arbitrary triangle mesh at the granularity of individual vertices, to create meshes that are highly adapted to dynamic view parameters. Such fine-grain control has previously been demonstrated using sequential CPU algorithms. However, these algorithms involve pointer-based structures with intricate dependencies that cannot be handled efficiently within the restricted framework of GPU parallelism. We show that by introducing new data structures and dependency rules, one can realize fine-grain progressive mesh updates as a sequence of parallel streaming passes over the mesh elements. A major design challenge is that the GPU processes stream elements in isolation. The mesh update algorithm has time complexity proportional to the selectively refined mesh, and moreover can be amortized across several frames. The static data structure is remarkably compact, requiring only 57% more memory than an indexed triangle list. We demonstrate real-time exploration of complex models with normals and textures.
Processing of mesh edges lies at the core of many advanced realtime rendering techniques, ranging from shadow and silhouette computations, to motion blur and fur rendering. We present a scheme for efficient traversal of mesh edges that builds on the adjacency primitives and programmable geometry shaders introduced in recent graphics hardware. Our scheme aims to minimize the number of primitives while maximizing SIMD parallelism. These objectives reduce to a set of discrete optimization problems on the dual graph of the mesh, and we develop practical solutions to these graph problems. In addition, we extend two existing vertex cache optimization algorithms to produce cache-efficient traversal orderings for adjacency primitives. We demonstrate significant runtime speedups for several practical real-time rendering algorithms.
We present a framework and supporting algorithms to automate the use of data reprojection as a general tool for optimizing procedural shaders. Although the general strategy of caching and reusing expensive intermediate shading calculations across consecutive frames has previously been shown to provide an effective trade-off between speed and accuracy, the critical choices of what to reuse and at what rate to refresh cached entries have been left to a designer. The fact that these decisions require a deep understanding of a procedure’ssemantic structure makes it challenging to select optimal candidates among possibly hundreds of alternatives. Our automated approach relies on parametric models of the way possible caching decisions affect the shader’s performance and visual fidelity. These models are trained using a sample rendering session and drive an interactive profiler in which the user can explore the error/performance trade-offs associated with incorporating temporal reprojection. We evaluate the proposed models and selection algorithm with a prototype system used to optimize several complex shaders and compare our approach to current alternatives.
This paper introduces a framebuffer level of detail algorithm for controlling the pixel workload in an interactive rendering application. Our basic strategy is to evaluate the shading in a low resolution buffer and, in a second rendering pass, resample this buffer at the desired screen resolution. The size of the lower resolution buffer provides a trade-off between rendering time and the level of detail in the final shading. In order to reduce approximation error we use a feature-preserving reconstruction technique that more faithfully approximates the shading near depth and normal discontinuities. We also demonstrate how intermediate components of the shading can be selectively resized to provide finer-grained control over resource allocation. Finally, we introduce a simple control mechanism that continuously adjusts the amount of resizing necessary to maintain a target framerate. These techniques do not require any preprocessing, are straightforward to implement on modern GPUs, and are shown to provide significant performance gains for several pixel-bound scenes.
Several recently proposed techniques based on the principle of data reprojection allow reusing shading information generated in one frame to accelerate the calculation of the shading in the following frame. This strategy can significantly reduce the average rendering cost for many important real-time effects at an acceptable level of approximation error. This paper analyzes the overhead associated with incorporating temporal data reprojection on modern GPUs. Based on this analysis, we propose an alternative algorithm to those previously described in the literature and measure its efficiency for multiple scenes and hardware platforms.
We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). The most recent GPU features include support for writing to random memory locations, efficient inter-processor communication, and a programming model for general-purpose computing. Taking advantage of these new features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts.
We present novel algorithms that optimize the order in which triangles are rendered, to improve post-transform vertex cache efficiency as well as for view-independent overdraw reduction. The resulting triangle orders perform on par with previous methods, but are orders magnitude faster to compute. The improvements in processing speed allow us to perform the optimization right after a model is loaded, when more information on the host hardware is available. This allows our vertex cache optimization to often outperform other methods. In fact, our algorithms can even be executed interactively, allowing for re-optimization in case of changes to geometry or topology, which happen often in CAD/CAM applications. We believe that most real-time rendering applications will immediately benefit from these new results.
Evaluating pixel shaders consumes a growing share of the computational budget for real-time applications. However, the significant temporal coherence in visible surface regions, lighting conditions, and camera location allows reusing computationally-intensive shading calculations between frames to achieve significant performance improvements at little degradation in visual quality. This paper investigates a caching scheme based on reverse reprojection which allows pixel shaders to store and reuse calculations performed at visible surface points. We provide guidelines to help programmers select appropriate values to cache and present several policies for keeping cached entries up-to-date. Our results confirm this approach offers substantial performance gains for many common real-time effects, including precomputed global lighting effects, stereoscopic rendering, motion blur, depth of field, and shadow mapping.
Recently, graphics processing units, or GPUs, have become a viable alternative as commodity, parallel hardware for general-purpose computing, due to their massive data-parallelism, high memory bandwidth, and improved general-purpose programming interface. In this paper, we explore the use of GPU on the grid file, a traditional multidimensional access method. Considering the hardware characteristics of GPUs, we design a massively multi-threaded GPU-based grid file for static, memory-resident multidimensional point data. Moreover, we propose a hierarchical grid file variant to handle data skews efficiently. Our implementations on the NVIDIA G80 GTX graphics card are able to achieve two to eight times higher performance than their CPU counterparts on a single PC.
This paper introduces a new real-time shading model that uses spherical cap intersections to approximate a surface?s incident lighting from dynamic area light sources. Our method uses precomputed visibility information for static meshes to compute illumination with approximate highfrequency shadows in a single rendering pass. Because this technique relies on precomputed visibility data, the mesh is assumed to be static at render time. Due to its high efficiency and low memory footprint this method is highly suitable for games.

We describe an automatic preprocessing algorithm that reorders triangles in a mesh so as to enable the graphics hardware to efficiently cull vertex and pixel processing at rendering time. Our method starts by dividing the mesh into planar clusters which are subsequently sorted into a view-independent order which greatly reduces overdraw. The result is an increase in the opportunities for early Z-culling, reducing pixel processing time. The clusters are then optimized for mesh locality. This produces high rates of vertex cache hits, reducing vertex processing time. We have found that our method brings the overdraw rates of a wide range of models close to that of front-to-back order, while preserving state of the art vertex cache performance. This results in higher frame rates for pixel-bound applications with no penalty to vertex-bound applications.
We introduce a view-dependent level of detail rendering system designed with modern GPU architectures in mind. Our approach keeps the data in static buffers and geomorphs between different LODs using per-vertex weights for seamless transition. Our method is the first out-of-core system to support texture mapping, including a mechanism for texture LOD. This approach completely avoids LOD pops and boundary cracks while gracefully adapting to a specified framerate or level of detail. Our method is suitable for all classes of GPUs that provide basic vertex shader programmability, and is applicable for both out-of-core or instanced geometry. The contributions of our work include a preprocessing and rendering system for view-dependent LOD rendering by geomorphing static buffers using per-vertex weights, a vertex buffer tree to minimize the number of API draw calls when rendering coarse-level geometry, and automatic methods for efficient, transparent LOD control.
We propose a metric for surface parameterization specialized to its signal that can be used to create more efficient, high-quality texture maps. Derived from Taylor expansion of signal error, our metric predicts the signal approximation error - the difference between the original surface signal and its reconstruction from the sampled texture. Unlike previous methods, our metric assumes piecewise-linear reconstruction, and thus makes a good approximation to bilinear reconstruction employed in graphics hardware. We achieve significant savings in texture area for a desired signal accuracy compared to the signal-specialized parameterization metric proposed by Sander et al. in the 2002 Eurographics Workshop on Rendering.
We present the “Geometry Video,” a new data structure to encode animated meshes. Being able to encode animated meshes in a generic source-independent format allows people to share experiences. Changing the viewpoint allows more interaction than the fixed view supported by 2D video. Geometry videos are based on the “Geometry Image” mesh representation introduced by Gu et al. Our novel data structure provides a way to treat an animated mesh as a video sequence (i.e., 3D image) and is well suited for network streaming. This representation also offers the possibility of applying and adapting existing mature video processing and compression techniques (such as MPEG encoding) to animated meshes. This paper describes an algorithm to generate geometry videos from animated meshes. The main insight of this paper, is that Geometry Videos re-sample and re-organize the geometry information, in such a way, that it becomes very compressible. They provide a unified and intuitive method for level-of-detail control, both in terms of mesh resolution (by scaling the two spatial dimensions) and of frame rate (by scaling the temporal dimension). Geometry Videos have a very uniform and regular structure. Their resource and computational requirements can be calculated exactly, hence making them also suitable for applications requiring level of service guarantees.
We introduce multi-chart geometry images, a new representation for arbitrary surfaces. It is created by resam-pling a surface onto a regular 2D grid. Whereas the original scheme of Gu et al. maps the entire surface onto a single square, we use an atlas construction to map the surface piecewise onto charts of arbitrary shape. We dem-onstrate that this added flexibility reduces parametrization distortion and thus provides greater geometric fidelity, particularly for shapes with long extremities, high genus, or disconnected components. Traditional atlas construc-tions suffer from discontinuous reconstruction across chart boundaries, which in our context create unacceptable surface cracks. Our solution is a novel zippering algorithm that creates a watertight surface. In addition, we pre-sent a new atlas chartification scheme based on clustering optimization.
Complex meshes tend to have intricate, detailed silhouettes. This paper proposes two algorithms for extracting a simpler, approximate silhouette from a high-resolution model. Our methods preserve the important features of the silhouette by using the silhouette of a coarser, simplified mesh as a guide. Our simple silhouettes have significantly fewer edges than the original silhouette, while still preserving its appearance.
To reduce memory requirements for texture mapping a model, we build a surface parametrization specialized to its signal (such as color or normal). Intuitively, we want to allocate more texture samples in regions with greater signal detail. Our approach is to minimize signal approximation error — the difference between the original surface signal and its reconstruction from the sampled texture. Specifically, our signal-stretch pa-rametrization metric is derived from a Taylor expansion of signal error. For fast evaluation, this metric is pre-integrated over the surface as a metric tensor. We minimize this nonlinear metric using a novel coarse-to-fine hierarchical solver, further accelerated with a fine-to-coarse propagation of the integrated metric tensor. Use of metric tensors permits anisotropic squashing of the parametrization along directions of low signal gra-dient. Texture area can often be reduced by a factor of 4 for a desired signal accuracy compared to non-specialized parametrizations.
Given an arbitrary mesh, we present a method to construct a progressive mesh (PM) such that all meshes in the PM sequence share a common texture parametrization. Our method considers two important goals simultaneously. It minimizes texture stretch (small texture distances mapped onto large surface distances) to balance sampling rates over all locations and directions on the surface. It also minimizes texture deviation (“slippage” error based on parametric correspondence) to obtain accurate textured mesh approximations. The method begins by partitioning the mesh into charts using planarity and compactness heuristics. It creates a stretch-minimizing parametrization within each chart, and resizes the charts based on the resulting stretch. Next, it simplifies the mesh while respecting the chart boundaries. The parametrization is re-optimized to reduce both stretch and devia-tion over the whole PM sequence. Finally, the charts are packed into a texture atlas. We demonstrate using such atlases to sample color and normal maps over several models.
Aliasing is an important problem when rendering triangle meshes. Efficient antialiasing techniques such as mipmapping greatly improve the filtering of textures defined over a mesh. A major component of the remaining aliasing occurs along discontinuity edges such as silhouettes, creases, and material boundaries. Framebuffer supersampling is a simple remedy, but 2?2 super-sampling leaves behind significant temporal artifacts, while greater supersampling demands even more fill-rate and memory. We present an alternative that focuses effort on discontinuity edges by overdrawing such edges as antialiased lines. Although the idea is simple, several subtleties arise. Visible silhouette edges must be detected efficiently. Discontinuity edges need consistent orientations. They must be blended as they approach the silhouette to avoid popping. Unfortunately, edge blending results in blurriness. Our technique balances these two competing objectives of temporal smoothness and spatial sharpness. Finally, the best results are obtained when discontinuity edges are sorted by depth. Our approach proves surprisingly effective at reducing temporal artifacts commonly referred to as "crawling jaggies", with little added cost.
Approximating detailed models with coarse, texture-mapped meshes results in polygonal silhouettes. To eliminate this artifact, we introduce silhouette clipping, a framework for efficiently clip-ping the rendering of coarse geometry to the exact silhouette of the original model. The coarse mesh is obtained using progressive hulls, a novel representation with the nesting property required for proper clipping. We describe an improved technique for construct-ing texture and normal maps over this coarse mesh. Given a per-spective view, silhouettes are efficiently extracted from the original mesh using a precomputed search tree. Within the tree, hierarchical culling is achieved using pairs of anchored cones. The extracted silhouette edges are used to set the hardware stencil buffer and al-pha buffer, which in turn clip and antialias the rendered coarse ge-ometry. Results demonstrate that silhouette clipping can produce renderings of similar quality to high-resolution meshes in less ren-dering time.

Conference Demos, Posters and Sketches

Fast capacity constrained Voronoi tessellation.
H. Li, D. Nehab, L.-Y. Wei, P. V. Sander, C.-W. Fu.
I3D 2010 poster.

I3DC: Interactive Three-Dimensional Cubes.
K. Yang, Y. Li, Q. Luo, P. V. Sander, J. Shi.
ICDE 2009 poster paper.

Stack-based parallel recursion on graphics processors.
K. Yang, B. He, Q. Luo, P. V. Sander, J. Shi.
PPOPP 2009 poster.

GPUQP: Query Co-Processing using Graphics Processors
R. Fang, B. He, M. Lu, K. Yang, N. K. Govindaraju, Q. Luo, P. V. Sander.
SIGMOD 2007 demo paper.

Real-Time Reprojection Cache
D. Nehab, P. V. Sander, J. Isidoro.
SIGGRAPH 2006 sketch.
Long version available in GH 2007 paper.

Compressing and Managing Large Datasets for The Real-Time Parthenon Demo
P. V. Sander, J. Isidoro
I3D 2006 poster.

Early-Z Culling for Efficient Fluid Flow Simulation
P. V. Sander, N. Tatarchuk, J. L. Mitchell
I3D 2005 poster.
Long version available in ShaderX5 book.

Real-Time Skin Rendering on Graphics Hardware
P. V. Sander, D. Gosselin, J. L. Mitchell.
SIGGRAPH 2004 sketch.
Long version available in ShaderX3 book.

Book Chapters

Early-Z Culling for Efficient GPU-Based Fluid Simulation
P. V. Sander, N. Tatarchuk, J. L. Mitchell.
ShaderX5: Advanced Rendering Techniques, Charles River Media, 2006.

Methods for Real-Time Skin Rendering
D. Gosselin, P. V. Sander, J. L. Mitchell.
ShaderX3: Advanced Rendering With DirectX And OpenGL, Charles River Media, 2004.

Drawing a Crowd
D. Gosselin, P. V. Sander, J. L. Mitchell.
ShaderX3: Advanced Rendering With DirectX And OpenGL, Charles River Media, 2004.

Thesis

Sampling-Efficient Mesh Parametrization
P. V. Sander.
Ph.D. Dissertation. Harvard University, May 2003.

Gigapixel Panoramas

Below are some of my gigapixel panorama projects. These panoramas were taken with the RioHK group. Navigate through the images below by clicking to zoom-in, shift-clicking to zoom-out, and dragging to pan. To view the image in full-screen, select one of the "View in" options below the image. For additional panoramas, see the Panorama project home page.

Corcovado 67GP, 2010 [site]
Former world record for largest digital photograph
(newly processed version with vignetting and exposure correction)

Hong Kong 20GP, 2011 [site]
Largest image of Hong Kong

Technical Demos

These demos were developed when I worked at ATI's Application Research Group. They include several rendering effects, some of which based on the techniques from my papers. The demos are copyright of AMD (2004-2006).

Real-Time Parthenon

This demo, based on the Siggraph 2004 movie by Paul Debevec, consists of over 15 million polygons, derived from a real-world laser capture of the actual Parthenon in Athens, Greece. Image based lighting techniques and a novel LOD algorithm are used to render this dataset in real-time on the Radeon® X1800 graphics processor. (This demo uses the algorithm described in the 'Progressive Buffers' paper to render the Parthenon.)

Loading the player ...

Ruby: The DoubleCross

Through the use of motion captured animation, depth-of-field, realistic image based lighting and dynamic shadows; 'DoubleCross' borrows heavily from both gaming and movie genres to create a compelling demo that further raises the expectations for real-time graphics. (This demo uses the algorithm described in the 'Methods for Real-Time Skin Rendering' book chapter and SIGGRAPH sketch.)

Loading the player ...

Crowd

This demonstration shows the vertex shader processing power of the X800 being used to render a large crowd of soldiers (1400 in total) running across a rocky terrain. All of the models feature weighted skinned vertices and are independently animated. The behavior of the crowd is simulated using AI.implant (http://www.ai-implant.com). Additional techniques used in this demo are ambient occlusion for shadowing and fluid simulation on the GPU for the smoke. (This demo uses algorithms described in the 'Rendering a Crowd' book chapter and the 'Explicit Early-Z Culling for Efficient Fluid Flow Simulation and Rendering' technical report.)

Loading the player ...