1 option
Programming massively parallel processors : a hands-on approach / David B. Kirk and Wen-mei W. Hwu.
- Format:
- Book
- Author/Creator:
- Kirk, David, 1960-
- Language:
- English
- Subjects (All):
- Parallel programming (Computer science).
- Parallel processing (Electronic computers).
- Multiprocessors.
- Computer architecture.
- Physical Description:
- 1 online resource (519 p.)
- Edition:
- 2nd ed.
- Place of Publication:
- Burlington, MA : Morgan Kaufmann Publishers, c2013.
- Language Note:
- English
- System Details:
- text file
- Summary:
- Programming Massively Parallel Processors: A Hands-on Approach shows both student and professional alike the basic concepts of parallel programming and GPU architecture. Various techniques for constructing parallel programs are explored in detail. Case studies demonstrate the development process, which begins with computational thinking and ends with effective and efficient parallel programs. Topics of performance, floating-point format, parallel patterns, and dynamic parallelism are covered in depth. This best-selling guide to CUDA and GPU parallel programming has been revis
- Contents:
- Front Cover; Programming Massively Parallel Processors; Copyright Page; Contents; Preface; Target Audience; How to Use the Book; A Three-Phased Approach; Tying It All Together: The Final Project; Project Workshop; Design Document; Project Report; Online Supplements; Acknowledgements; Dedication; 1 Introduction; 1.1 Heterogeneous Parallel Computing; 1.2 Architecture of a Modern GPU; 1.3 Why More Speed or Parallelism?; 1.4 Speeding Up Real Applications; 1.5 Parallel Programming Languages and Models; 1.6 Overarching Goals; 1.7 Organization of the Book; References; 2 History of GPU Computing
- 2.1 Evolution of Graphics PipelinesThe Era of Fixed-Function Graphics Pipelines; Evolution of Programmable Real-Time Graphics; Unified Graphics and Computing Processors; 2.2 GPGPU: An Intermediate Step; 2.3 GPU Computing; Scalable GPUs; Recent Developments; Future Trends; References and Further Reading; 3 Introduction to Data Parallelism and CUDA C; 3.1 Data Parallelism; 3.2 CUDA Program Structure; 3.3 A Vector Addition Kernel; 3.4 Device Global Memory and Data Transfer; 3.5 Kernel Functions and Threading; 3.6 Summary; Function Declarations; Kernel Launch; Predefined Variables; Runtime API
- 3.7 ExercisesReferences; 4 Data-Parallel Execution Model; 4.1 Cuda Thread Organization; 4.2 Mapping Threads to Multidimensional Data; 4.3 Matrix-Matrix Multiplication-A More Complex Kernel; 4.4 Synchronization and Transparent Scalability; 4.5 Assigning Resources to Blocks; 4.6 Querying Device Properties; 4.7 Thread Scheduling and Latency Tolerance; 4.8 Summary; 4.9 Exercises; 5 CUDA Memories; 5.1 Importance of Memory Access Efficiency; 5.2 CUDA Device Memory Types; 5.3 A Strategy for Reducing Global Memory Traffic; 5.4 A Tiled Matrix-Matrix Multiplication Kernel
- 5.5 Memory as a Limiting Factor to Parallelism5.6 Summary; 5.7 Exercises; 6 Performance Considerations; 6.1 Warps and Thread Execution; 6.2 Global Memory Bandwidth; 6.3 Dynamic Partitioning of Execution Resources; 6.4 Instruction Mix and Thread Granularity; 6.5 Summary; 6.6 Exercises; References; 7 Floating-Point Considerations; 7.1 Floating-Point Format; Normalized Representation of M; Excess Encoding of E; 7.2 Representable Numbers; 7.3 Special Bit Patterns and Precision in Ieee Format; 7.4 Arithmetic Accuracy and Rounding; 7.5 Algorithm Considerations; 7.6 Numerical Stability; 7.7 Summary
- 7.8 ExercisesReferences; 8 Parallel Patterns: Convolution; 8.1 Background; 8.2 1D Parallel Convolution-A Basic Algorithm; 8.3 Constant Memory and Caching; 8.4 Tiled 1D Convolution with Halo Elements; 8.5 A Simpler Tiled 1D Convolution-General Caching; 8.6 Summary; 8.7 Exercises; 9 Parallel Patterns: Prefix Sum; 9.1 Background; 9.2 A Simple Parallel Scan; 9.3 Work Efficiency Considerations; 9.4 A Work-Efficient Parallel Scan; 9.5 Parallel Scan for Arbitrary-Length Inputs; 9.6 Summary; 9.7 Exercises; Reference; 10 Parallel Patterns: Sparse Matrix-Vector Multiplication; 10.1 Background
- 10.2 Parallel SpMV Using CSR
- Notes:
- Description based upon print version of record.
- Includes bibliographical references and index.
- ISBN:
- 9780123914187
- 0123914183
- OCLC:
- 823723538
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.