Emerald Edition Jade Edition
HOME |  AUTHORS |  REVIEWS |  CONTENTS |  ONLINE RESOURCES EXCERPT

Excerpt

Below we have included links to and excerpts from the Section Introductions for all 10 Sections of GPU  Computing Gems, Emerald Edition as well as links to and excerpts from Chapters 1, 19, and 39. These excerpts were selected by the Editor-in-Chief, Wen-mei W. Hwu to share. We hope this gives you a comprehensive glimpse into the contents of this 865-page book.

Front Matter

Section 1: Scientific Simulation
Area Editor’s Introduction
Robert M. Farber

THE STATE OF GPU COMPUTING IN SCIENTIFIC SIMULATION
GPU computing is revolutionizing scientific simulation by providing one to two orders of magnitude of increased computing performance per GPU at price points even students can afford. Exciting things are happening with this technology in the hands of the masses, as reflected by the applications, CUDA Gems, and the extraordinary number of papers that have appeared in the literature since CUDA was first introduced in February 2007.
Technology that provides two or more orders of magnitude of increased computational capability is disruptive and has the potential to fundamentally affect scientific research by removing time-todiscovery barriers. I cannot help getting excited by the potential as simulations that previously would have taken a year ormore to complete can now be finished in days. Better scientific insight also becomes possible because researchers can work with more data and have the ability to utilize more accurate, albeit computationally expensive, approximations and numerical methods. We are now entering the era where hybrid clusters and supercomputers containing large numbers of GPUs are being built and used around the world. As a result, many researchers (and funding agencies) now have to rethink their computational models and invest in software to create scalable, high-performance applications based on this technology. The potential is there, and some lucky researchers may find themselves with a Galilean first opportunity to see, study, and model using exquisitely detailed data from projects utilizing GPU technology and these hybrid systems.  READ MORE

Chapter 1: GPU-Accelerated Computation and Interactive Display of Molecular Orbitals
John E. Stone, David J. Hardy, Jan Saam, Kirby L. Vandivort, Klaus Schulten

In this chapter, we present several graphics processing unit (GPU) algorithms for evaluating molecular orbitals on three-dimensional lattices, as is commonly used for molecular visualization. For each kernel, we describe necessary design trade-offs, applicability to various problem sizes, and performance on different generations of GPU hardware. We then demonstrate the appropriate and effective use of fast on-chip GPU memory subsystems for access to key data structures, show several GPU kernel optimization principles, and explore the application of advanced techniques such as dynamic kernel generation and just-in-time (JIT) kernel compilation techniques. READ MORE

Section 2: Life Sciences
Area Editor’s Introduction
Bertil Schmidt

STATE OF GPU COMPUTING IN LIFE SCIENCES
Life sciences have emerged as a primary application area for the use of GPU computing. This is mainly caused by the large amount of publicly available sequence, expression, and structure data. The amount of available data will grow even further in the near future owing to advances in high-throughput technologies leading to a data explosion. Because GPU performance grows faster than CPU performance, the use of GPUs in the life sciences is therefore a perfect match.
A particular area of interest in this context is next-generation sequencing (NGS) technology, which can now produce billions of sequences (reads) on a daily basis. The usage of GPUs can thus play a key role in NGS, and its future applications (such as personal genomics) by providing the necessary computing power to process and analyze this data. READ MORE

Section 3: Statistical Modeling
Area Editor’s Introduction
Mike Giles

STATE OF GPU COMPUTING IN STATISTICAL MODELING
Many kinds of statistical modeling are naturally suited for GPU computing owing to their inherent parallelism. For example, in Monte Carlo simulation there are usually many independent simulations to be done, and each can be performed by a separate thread. The challenges lie in coping with unusual features that could lead to warp divergence or poor memory coalescence, or require a global reduction operation. Nevertheless, this is an area with rapid adoption of GPU technology within universities.
Banks and other financial organizations have also been actively investigating the potential of GPUs. Computational finance has been a major HPC growth area in the past 20 years, and with the development of commercial random number-generation software for GPUs, I think we are likely to see widespread use of GPUs in the near future. READ MORE

Section 4: Emerging Data-Intensive Applications
Area Editor’s Introduction
Volodymyr Kindratenko

THE STATE OF GPU COMPUTING IN DATA-INTENSIVE APPLICATIONS
Many of today’s data-intensive problems, such as data mining and machine learning, push the boundaries of conventional computing architectures with ever-increasing requirements for greater performance. GPUs are a good match for these applications because of their high memory bandwidth and massive computation power. But achieving high performance for this class of applications using GPUs can be quite challenging because of irregular data access patterns and complex heuristics employed in many of the data processing algorithms. Only a few recent efforts have resulted in productive GPU implementations of data-intensive applications, some of which are included in this section.
The use of GPUs in data-intensive applications is poised to explode in the near future. Although in the past many scientific communities focused on how to obtain more experimental or observed data,today scientists are concerned about what to do with the flood of data produced by modern scientific instruments. Fast analysis of very large volumes of data is now of paramount importance in astronomy, biology, finance, and medicine, and software developers are more and more inclined to take advantage of the GPU hardware to achieve the desirable level of performance. READ MORE

Chapter 19: Large-Scale Machine Learning
Jerod J. Weinman, Augustus Lidaka, Shitanshu Aggarwal

A typical machine-learning algorithm creates a classification function that inductively generalizes from training examples — input features and associated classification labels — to previously unseen examples requiring labels. Optimizing the prediction accuracy of the learned function for complex problems can require massive amounts of training data. This chapter describes a GPU-based implementation of a discriminative maximum entropy learning algorithm that can improve runtime on large datasets by a factor of over 200. READ MORE

Section 5: Electronic Design Automation
Area Editor’s Introduction
Sunil P. Khatri

THE STATE OF GPU COMPUTING IN ELECTRONIC DESIGN AUTOMATION
The success of very large-scale integrated (VLSI) design hinges heavily on design automation techniques to speed up the design process. Electronic design automation (EDA) software utilizes several key underlying algorithms, and an efficient implementation of these algorithms holds the key to our ability to design highly integrated, complex integrated circuits (ICs) of the future. Over the last few years, GPUs have received close attention by EDA practitioners, and significant speedups have been obtained by implementations of several key algorithms on the GPU. Some of these algorithms include logic simulation, Boolean satisfiability, fault simulation, and state space exploration.
In the future, it is expected that GPU computing will hold a key role in EDA advances. As more EDA algorithms are sped up using GPUs, it is conceivable that users of EDA tools will have the ability to quickly perform what-if analyses when different optimization options are invoked. Such an ability is not available today owing to the large runtimes associated with several steps of the EDA flow. More flexible GPU architectures will hold the key to making this possible. READ MORE

Section 6: Ray Tracing and Rendering
Area Editor’s Introduction
Austin Robison

THE STATE OF GPU COMPUTING IN RAY TRACING AND RENDERING
We are on the cusp of a rendering renaissance. Although GPUs have traditionally been used to drive hardware rasterization pipelines, and GPU computing has ventured into many different fields as a general computing platform, it is not to be forgotten that computer graphics stands to greatly benefit from these new programming models.
As GPUs continue to become more powerful and flexible, we will see an explosion of hybrid rendering techniques running on these processors: graphics pipelines that do not rely on a single rendering algorithm to produce their final images, but rather use the algorithms that are best suited to the needs of the desired images. As we are more and more able to use the right rendering tool for the job, the results will be spectacular, and GPUs will take us there.
We have begun to see the components of these pipelines being created, several of which are detailed in this section, and as new algorithms are developed, we will continue to experience increases in rendering quality and speed, enabling a new generation of fantastic imagery.  READ MORE

Section 7: Computer Vision
Area Editor’s Introduction
James Fung

THE STATE OF GPU COMPUTING IN COMPUTER VISION
The GPU has found a natural fit for accelerating computer vision algorithms.With its high performance and flexibility, GPU computing has seen its application in computer vision evolve from providing fast early vision results to new applications in the middle and late stages of vision algorithms. Completely “GPU-resident” computer vision pipelines are being constructed owing to the high degree of programmability of the GPU. The GPU is now allowing high-quality vision algorithms to operate at interactive frame rates.
Real-time computation aids the developer by providing faster algorithm testing and feedback and by bringing previously impractically large datasets or complex algorithms into the realm of possibility. As a widely adopted commodity processor, the GPU makes the previously intractable real-time computation required in computer vision achievable in a home PC or even portable laptop computer, and this brings computer vision out of the lab and into everyday application. As a result, GPU computing is enabling fast, intelligent image analysis and interpretation of the personal images, video, and media that we produce and view each day. In the context of larger applications, the GPU is providing the platform for creating interactive computer vision-based experiences and interfaces. READ MORE

Section 8: Video and Image Processing
Area Editor’s Introduction
Timo Stich

THE STATE OF GPU COMPUTING IN VIDEO AND IMAGE PROCESSING
GPUs have played a role in video and image processing for a long time. In the beginning they were used to display the processed results. Quickly, application developers picked up GPU computing, and GPUs are becoming the main processing devices in today’s video- and image-processing applications. The ever-increasing amount of video and image data demands ever-increasing computational power while offering at the same time more potential for parallel computation. The GPU, with its manycore architecture, is the perfect match to these challenges and delivers the computational performance necessary to drive the image- and video-processing applications of today and the future.
GPU computing not only significantly speeds up existing workflows in video and image processing but also allows for more creativity by using the additional computational power to transform the workflows themselves. For example, with GPU computing, filters and operators can be performed in real time on full HD video, making low-resolution preview windows obsolete. At present, sophisticated video- and image-processing applications are only used off-line owing to long runtimes. When those applications take advantage of GPU computing, I expect even more breakthroughs. The transitioning of these applications into the real-time processing domain offers the opportunity for additional user interaction to be added, enabling the creation of new and more intelligent interactive tools in videoand image-processing applications. READ MORE

Section 9: Signal and Audio Processing
Area Editor’s Introduction
John Roberts

THE STATE OF GPU COMPUTING IN SIGNAL AND AUDIO PROCESSING
Inexorable growth in the volume of digital data available and the stunning advance of parallel computational power in GPUs is leading to application of GPU computing for signal processing in such areas as telecommunications, networking, multimedia, man-machine interfaces, signal intelligence, and data analytics.
Many of the computations involved in these domains lend themselves naturally to parallel computing, but others present challenges inherent in the organization, scale, or distribution of the data. Developers and authors such as those featured in this chapter find innovative approaches to address these and other issues to achieve unprecedented performance. Based on these results and the breadth of their applicability, extensive use of GPUs for computation in signal and audio processing should be expected in the future. READ MORE

Chapter 39: Large-Scale Fast Fourier Transform
Yifeng Chen, Xiang Cui, Hong Mei

Bandwidth-intensive tasks such as large-scale fast Fourier transfers (FFTs) without data locality are hard to accelerate on GPU clusters because the bottleneck often lies with the PCI bus or the communication network. Optimizing FFT for a single-GPU device will not improve the overall performance.
This chapter shows how to achieve substantial speedups for these tasks. Three GPU-related factors contribute to better performance: first, the use of GPU devices improves the sustained memory bandwidth for processing large-size data; second, GPU device memory allows larger subtasks to be processed in whole and hence reduces repeated data transfers between memory and processors; and finally some costly main-memory operations such as matrix transposition can be significantly sped up by GPUs if necessary data adjustment is performed during data transfers. The technique of manipulating array dimensions during data transfer is the main technical contribution. These factors (as well as the improved communication library in our implementation) attribute to 24.3x speedup with respect to FFTW and 7x speedup with respect to Intel MKL for 4096 3-D single-precision FFT on a 16-node cluster with 32 GPUs. Around 5x speedups with respect to both standard libraries are achieved for double precision. READ MORE

Section 10: Medical Imaging
Area Editor’s Introduction
Lawrence Tarbox

THE STATE OF GPU COMPUTING IN MEDICAL IMAGING
The use of GPU computing in medical imaging has exploded over the last few years. Early uses centered on using the texture-mapping capabilities of GPUs to do volume visualization of the 3-D datasets routinely created by modern medical imaging equipment. As people gained more experience with GPU computing and as GPUs became more capable, activity shifted to more sophisticated postprocessing techniques to better support medical diagnosis and research. Many of the visualization and image-processing techniques utilized in medical imaging, such as registration, segmentation, and classification, share methodologies with other disciplines, such as computer vision, and there is a significant amount of cross-pollination between those communities.
A flurry of recent activity has been in using GPU computing to do the initial reconstruction of measurement data into three-dimensional volume sets suitable for visualization and further processing. Before the advent of GPU computing, image reconstruction was done using digital signal and vector processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and so on. The cost-effectiveness, speed, and programmability of GPU computing is a major driving force in this shift away from these more expensive, less flexible technologies towards image reconstruction on GPUs.
The processing techniques of medical imaging often include repetitive calculations done on large multidimensional arrays of data that are highly suited to the parallel-processing capabilities of GPUs. In fact, the speed and power of GPU computing has facilitated the clinical use of algorithms that previously were relegated to research laboratories due to their immense computational load. The principal challenge to using GPU computing in medical imaging is managing the massive amounts of data involved, which often overwhelms the memory capacity of existing GPU-based compute engines. READ MORE