GPU Gems 3 Source Code Download: How to Access the CD Content and Demos Online
- connorthtalico
- Aug 14, 2023
- 6 min read
Thank you for purchasing GPU Gems 3. This DVD contains sample code and demonstrations for manyof the book's chapters, as provided by the contributors. Updates and additional material can be found on thebook's Web site, developer.nvidia.com/gpugems3.
In the list below, each chapter that has anaccompanying codesample or demonstration is linked to the corresponding ZIP file, installer EXE,movie clip, or foldercontaining the unzipped data. In general, a ZIP file is present only if theunzipped folder would not have fit on the DVD. Where possible, the contributorshave included executable versions of their samples in addition to source code.
gpu gems 3 source code download
In the list below, each chapter that has anaccompanying codesample or demonstration is linked to the corresponding ZIP file, installer EXE,movie clip, or foldercontaining the unzipped data. In general, a ZIP file is present only if theunzipped folder would not have fit on the CD. Where possible, the contributorshave included executable versions of their samples in addition to source code.
The CUDA Developer SDK provides examples with source code, utilities, and white papers to help you get started writing software with CUDA. The SDK includes dozens of code samples covering a wide range of applications including:Simple techniques such as C++ code integration and efficient loading of custom datatypes
How-To examples covering CUDA BLAS and FFT libraries, texture fetching in CUDA, and CUDA interoperation with the OpenGL and Direct3D graphics APIS
Linear algebra primitives such as matrix transpose and matrix-matrix multiplication
Data-parallel algorithms such as parallel prefix sum of large arrays
Performance: profiling using timers and bandwidth tests
Advanced application examples such as image convolution, Black-Scholes options pricing and binomial options pricing
Refer to the following READMEs for more information (Linux , Windows )
CUDPP is implemented in CUDA C/C++. It requires the CUDA Toolkit. Please see the NVIDIA CUDA homepage to download CUDA as well as the CUDA Programming Guide and CUDA SDK, which includes many CUDA code examples.
CUDPP is copyright The Regents of the University of California, Davis campus and NVIDIA Corporation. The library, examples, and all source code are released under the BSD license, designed to encourage reuse of this software in other projects, both commercial and non-commercial. For details, please see the CUDPP License page.
Note that prior to release 1.1 of CUDPP, the license used was a modified BSD license. With release 1.1, this license was replaced with the pure BSD license to facilitate the use of open source hosting of the code.
Having all shader code for one effect in a single place allows us to share as much of that code as possible across all of the different techniques. Rather than using a single, monolithic effect file, we broke it down into multiple shader libraries, source files that contain shared vertex and pixel programs and generic functions, that are used by many effects. This approach minimized shader code duplication, making maintenance easier, decreasing the number of bugs, and improving consistency across shaders.
We present a general purpose, open-source software library for estimation of non-linear parameters by the Levenberg-Marquardt algorithm. The software, Gpufit, runs on a Graphics Processing Unit (GPU) and executes computations in parallel, resulting in a significant gain in performance. We measured a speed increase of up to 42 times when comparing Gpufit with an identical CPU-based algorithm, with no loss of precision or accuracy. Gpufit is designed such that it is easily incorporated into existing applications or adapted for new ones. Multiple software interfaces, including to C, Python, and Matlab, ensure that Gpufit is accessible from most programming environments. The full source code is published as an open source software repository, making its function transparent to the user and facilitating future improvements and extensions. As a demonstration, we used Gpufit to accelerate an existing scientific image analysis package, yielding significantly improved processing times for super-resolution fluorescence microscopy datasets.
Here, we present Gpufit: a GPU-accelerated implementation of the Levenberg-Marquardt algorithm. Gpufit was developed to meet the need for a high performance, general-purpose nonlinear curve fitting library which is publicly available and open source. As expected, this software exhibits significantly faster execution than the equivalent CPU-based code, with no loss of precision or accuracy. In this report we discuss the design of Gpufit, characterize its performance in comparison to other CPU-based and GPU-based algorithms, and demonstrate its use in a scientific data analysis application.
The Gpufit library was designed to meet several criteria: (i) the software should make efficient use of the GPU resources in order to maximize execution speed, (ii) the interface should not require detailed knowledge of the GPU hardware, (iii) the source code should be modular and extendable, and (iv) the software should be accessible from multiple programming environments.
GPU architecture is based on a set of parallel multiprocessors, which divide computations between blocks of processing threads, as illustrated in Supplementary Fig. S1. The efficiency of a GPU-based program depends on how these computing resources are used. While determining how best to parallelize the LMA, we found that different parts of the algorithm were most efficiently implemented with different parallelization strategies. For example, point-wise operations such as computation of the model function and its partial derivatives were more efficiently parallelized along the data coordinate index, meaning that each thread computes one model and derivative value at a particular coordinate. Other steps, such as the calculation of the Hessian matrix, were more efficiently parallelized along the index of the matrix element, i.e. with each thread assigned to calculate one element of the matrix. To accommodate the necessity for parallelizing different parts of the LMA in different ways, we structured Gpufit as a set of independent CUDA kernels, each responsible for a section of the algorithm. In this way, the blocks and threads of the GPU multiprocessors could be optimally allocated at each step. The details of the various parallelization schemes are documented in the Gpufit source code (see Code Availability).
The Gpufit source code is modular, such that fit functions and goodness-of-fit estimators are separate from the core sections of the code, and new functions or estimators may be added simply (see Supplementary Information). In its initial release, Gpufit includes two different fit estimators: the standard weighted least-squares estimator (LSE), and a maximum likelihood estimator (MLE) which provides better fit results when the input data is characterized by Poisson statistics6. The modular concept is illustrated schematically in Supplementary Fig. S2. This modularity allows Gpufit to be quickly adapted to new applications, or modified to accommodate future developments.
Given the parallel computing capability of the GPU, it was not surprising that Gpufit outperformed an equivalent algorithm running on the CPU. In order to verify that our code is efficiently implemented, we therefore tested Gpufit against another GPU-based fitting library: GPU-LMFit8. These tests were limited to smaller datasets because GPU-LMFit is available only as a closed-source, 32-bit binary package, restricting the size of the memory it can address. Figure 3a shows the speed of the Gpufit and GPU-LMFit libraries measured as a function of the number of fits per function call (N), with the speed of the MINPACK library shown for reference. Both packages exhibited similar scaling in speed as the number of fits and the data size varied, however, Gpufit showed faster performance for all conditions tested. As the data size per fit was increased (Fig. 3b), the speeds became more comparable, indicating that Gpufit makes more efficient use of GPU resources for smaller fits. The increased speed comes with no loss of precision, as it was also shown that the fit results returned by Gpufit and GPU-LMFit have virtually identical numerical precision (see Supplementary Fig. S7).
In terms of performance, Gpufit exhibits similar precision and accuracy to other fitting libraries, but with significantly faster execution. In our measurements, curve fitting with Gpufit was approximately 42 times faster than the same algorithm running on the CPU. Gpufit also outperformed another GPU-based implementation of the LMA, GPU-LMFit, for all conditions tested. The absolute values of the timing results depend on the details of the fit and the computer hardware. Higher performance would be expected with a more powerful GPU (e.g. an Nvidia Tesla), or with multiple GPUs running in parallel. In addition to its speed, the principal advantages of Gpufit are its general purpose design, which may incorporate any model function or modified estimator, and the availability of the source code, which allows it to be compiled and run on multiple computing architectures.
As of its initial release, the Gpufit package has several limitations. First, the fit model functions are built into the code at compilation time, and the addition or modification of a model function requires re-compilation of the source code. We also note that Gpufit requires the explicit calculation of the partial derivatives of the model, and expressions for these functions must be present in the code embodying the fit model function. However, as an open-source software project, we expect that Gpufit will continue to develop and improve, potentially removing these limitations in future versions. For example, runtime compilation of model functions written in CUDA would lift the requirement for re-building the source, and methods to approximate the derivatives numerically could also be introduced. Finally, there is the potential for porting Gpufit to other general-purpose GPU computing languages, such as OpenCL, thereby allowing the software to function on other GPU hardware platforms. 2ff7e9595c




Comments