1 option
Programming Your GPU with OpenMP : Performance Portability for GPUs / Tom Deakin and Timothy G. Mattson.
- Format:
- Book
- Author/Creator:
- Deakin, Tom, author.
- Mattson, Timothy G., 1958- author.
- Series:
- Scientific and engineering computation.
- Scientific and Engineering Computation Series
- Language:
- English
- Subjects (All):
- Graphics processing units--Programming.
- Graphics processing units.
- OpenMP (Application program interface).
- Physical Description:
- 1 online resource (221 pages)
- Edition:
- First edition.
- Place of Publication:
- Cambridge, Massachusetts : The MIT Press, [2023]
- Summary:
- "OpenMP is a widely used language for programming the nodes in a parallel computer. Those nodes are now heterogeneous, including a GPU alongside the traditional CPU"-- Provided by publisher.
- Contents:
- Intro
- Series Page
- Title Page
- Copyright
- Dedication
- Table of Contents
- Series Foreword
- Preface
- Acknowledgments
- I. Setting the Stage
- 1. Heterogeneity and the Future of Computing
- 1.1. The Basic Building Blocks of Modern Computing
- 1.1.1. The CPU
- 1.1.2. The SIMD Vector Unit
- 1.1.3. The GPU
- 1.2. OpenMP: A Single Code-Base for Heterogeneous Hardware
- 1.3. The Structure of This Book
- 1.4. Supplementary Materials
- 2. OpenMP Overview
- 2.1. Threads: Basic Concepts
- 2.2. OpenMP: Basic Syntax
- 2.3. The Fundamental Design Patterns of OpenMP
- 2.3.1. The SPMD Pattern
- 2.3.2. The Loop-Level Parallelism Pattern
- 2.3.3. The Divide-and-Conquer Pattern
- 2.3.3.1. Tasks in OpenMP
- 2.3.3.2. Parallelizing Divide-and-Conquer
- 2.4. Task Execution
- 2.5. Our Journey Ahead
- II. The GPU Common Core
- 3. Running Parallel Code on a GPU
- 3.1. Target Construct: O oading Execution onto a Device
- 3.2. Moving Data between the Host and a Device
- 3.2.1. Scalar Variables
- 3.2.2. Arrays on the Stack
- 3.2.3. Derived Types
- 3.3. Parallel Execution on the Target Device
- 3.4. Concurrency and the Loop Construct
- 3.5. Example: Walking through Matrix Multiplication
- 4. Memory Movement
- 4.1. OpenMP Array Syntax
- 4.2. Sharing Data Explicitly with the Map Clause
- 4.2.1. The Map Clause
- 4.2.2. Example: Vector Add on the Heap
- 4.2.3. Example: Mapping Arrays in Matrix Multiplication
- 4.3. Reductions and Mapping the Result from the Device
- 4.4. Optimizing Data Movement
- 4.4.1. Target Data Construct
- 4.4.2. Target Update Directive
- 4.4.3. Target Enter/Exit Data
- 4.4.4. Pointer Swapping
- 4.5. Summary
- 5. Using the GPU Common Core
- 5.1. Recap of the GPU Common Core
- 5.2. The Eightfold Path to Performance
- 5.2.1. Portability
- 5.2.2. Libraries
- 5.2.3. The Right Algorithm.
- 5.2.4. Occupancy
- 5.2.5. Converged Execution Flow
- 5.2.6. Data Movement
- 5.2.7. Memory Coalescence
- 5.2.8. Load Balance
- 5.3. Concluding the GPU Common Core
- III. Beyond the Common Core
- 6. Managing a GPU's Hierarchical Parallelism
- 6.1. Parallel Threads
- 6.2. League of Teams of Threads
- 6.2.1. Controlling the Number of Teams and Threads
- 6.2.2. Distributing Work between Teams
- 6.3. Hierarchical Parallelism in Practice
- 6.3.1. Example: Batched Matrix Multiplication
- 6.3.2. Example: Batched Gaussian Elimination
- 6.4. Hierarchical Parallelism and the Loop Directive
- 6.4.1. Combined Constructs that Include Loop
- 6.4.2. Reductions and Combined Constructs
- 6.4.3. The Bind Clause
- 6.5. Summary
- 7. Revisiting Data Movement
- 7.1. Manipulating the Device Data Environment
- 7.1.1. Allocating and Deleting Variables
- 7.1.2. Map Type Modi ers
- 7.1.3. Changing the Default Mapping
- 7.2. Compiling External Functions and Static Variables for the Device
- 7.3. User-De ned Mappers
- 7.4. Team-Only Memory
- 7.5. Becoming a Cartographer: Mapping Device Memory by Hand
- 7.6. Uni ed Shared Memory for Productivity
- 7.7. Summary
- 8. Asynchronous O oad to Multiple GPUs
- 8.1. Device Discovery
- 8.2. Selecting a Default Device
- 8.3. O oad to Multiple Devices
- 8.3.1. Reverse O oad
- 8.4. Conditional O oad
- 8.5. Asynchronous O oad
- 8.5.1. Task Dependencies
- 8.5.2. Asynchronous Data Transfers
- 8.5.3. Task Reductions
- 8.6. Summary
- 9. Working with External Runtime Environments
- 9.1. Calling External Library Routines from OpenMP
- 9.2. Sharing OpenMP Data with Foreign Functions
- 9.2.1. The Need for Synchronization
- 9.2.2. Example: Sharing OpenMP Data with cuBLAS
- 9.3. Using Data from a Foreign Runtime with OpenMP
- 9.3.1. Example: Sharing cuBLAS Data with OpenMP
- 9.3.2. Avoiding Unportable Code.
- 9.4. Direct Control of Foreign Runtimes
- 9.4.1. Query Properties of the Foreign Runtime
- 9.4.2. Using the Interop Construct to Correctly Synchronize with Foreign Functions
- 9.4.3. Non-blocking Synchronization with a Foreign Runtime
- 9.4.4. Example: Calling CUDA Kernels without Blocking
- 9.5. Enhanced Portability Using Variant Directives
- 9.5.1. Declaring Function Variants
- 9.5.1.1. OpenMP Context and the Match Clause
- 9.5.1.2. Modifying Variant Function Arguments
- 9.5.2. Controlling Variant Substitution with the Dispatch Construct
- 9.5.3. Putting It All Together
- 10. OpenMP and the Future of Heterogeneous Computing
- Appendix: Reference Guide
- A.1. Programming a CPU with OpenMP
- A.2. Directives and Constructs for the GPU
- A.2.1. Parallelism with Loop, Teams, and Worksharing Constructs
- A.2.2. Constructs for Interoperability
- A.2.3. Constructs for Device Data Environment Manipulation
- A.3. Combined Constructs
- A.4. Internal Control Variables, Environment Variables, and OpenMP API Functions
- Glossary
- References
- Subject Index
- Series List.
- Notes:
- Description based on print version record.
- Includes bibliographical references.
- Other Format:
- Print version: Deakin, Tom Programming Your GPU with OpenMP
- ISBN:
- 9780262377720
- 0262377721
- 9780262377737
- 026237773X
- OCLC:
- 1379240036
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.