My Account Log in

1 option

Advances in GPU research and practice / edited by Hamid Sarbazi-Azad.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Sarbazi-Azad, Hamid, author.
Contributor:
Sarbazi-Azad, Hamid, editor.
Series:
Emerging trends in computer science & applied computing.
Emerging trends in computer science and applied computing
Language:
English
Subjects (All):
Graphics processing units--Programming.
Graphics processing units.
Imaging systems.
Computer graphics.
Image processing--Digital techniques.
Image processing.
Physical Description:
1 online resource (776 pages) : illustrations (some color)
Edition:
First edition.
Place of Publication:
Amsterdam : Elsevier, [2017]
System Details:
text file
Summary:
Advances in GPU Research and Practice focuses on research and practices in GPU based systems. The topics treated cover a range of issues, ranging from hardware and architectural issues, to high level issues, such as application systems, parallel programming, middleware, and power and energy issues. Divided into six parts, this edited volume provides the latest research on GPU computing. Part I: Architectural Solutions focuses on the architectural topics that improve on performance of GPUs, Part II: System Software discusses OS, compilers, libraries, programming environment, languages, and paradigms that are proposed and analyzed to help and support GPU programmers. Part III: Power and Reliability Issues covers different aspects of energy, power, and reliability concerns in GPUs. Part IV: Performance Analysis illustrates mathematical and analytical techniques to predict different performance metrics in GPUs. Part V: Algorithms presents how to design efficient algorithms and analyze their complexity for GPUs. Part VI: Applications and Related Topics provides use cases and examples of how GPUs are used across many sectors. Discusses how to maximize power and obtain peak reliability when designing, building, and using GPUs Covers system software (OS, compilers), programming environments, languages, and paradigms proposed to help and support GPU programmers Explains how to use mathematical and analytical techniques to predict different performance metrics in GPUs Illustrates the design of efficient GPU algorithms in areas such as bioinformatics, complex systems, social networks, and cryptography Provides applications and use case scenarios in several different verticals, including medicine, social sciences, image processing, and telecommunications
Contents:
Front Cover
Advances in GPU Research and Practice
Copyright
Dedication
Contents
List of Contributors
Preface
Acknowledgments
Part 1: Programming and tools
Chapter 1: Formal analysis techniques for reliable GPU programming: current solutions and call to action
1 GPUs in Support of Parallel Computing
Bugs in parallel and GPU code
2 A quick introduction to GPUs
Organization of threads
Memory spaces
Barrier synchronization
Warps and lock-step execution
Dot product example
3 Correctness issues in GPU programming
Data races
Lack of forward progress guarantees
Floating-point accuracy
4 The need for effective tools
4.1 A Taxonomy of Current Tools
4.2 Canonical Schedules and the Two-Thread Reduction
Race freedom implies determinism
Detecting races: ``all for one and one for all''
Restricting to a canonical schedule
Reduction to a pair of threads
4.3 Symbolic Bug-Finding Case Study: GKLEE
4.4 Verification Case Study: GPUVerify
5 Call to Action
GPUs will become more pervasive
Current tools show promise
Solving basic correctness issues
Equivalence checking
Clarity from vendors and standards bodies
User validation of tools
References
Chapter 2: SnuCL: A unified OpenCL framework for heterogeneous clusters
1 Introduction
2 OpenCL
2.1 Platform Model
2.2 Execution Model
2.3 Memory Model
2.4 Synchronization
2.5 Memory Consistency
2.6 OpenCL ICD
3 Overview of SnuCL framework
3.1 Limitations of OpenCL
3.2 SnuCL CPU
3.3 SnuCL Single
3.4 SnuCL Cluster
3.4.1 Processing synchronization commands
4 Memory management in SnuCL Cluster
4.1 Space Allocation to Memory Objects
4.2 Minimizing Copying Overhead
4.3 Processing Memory Commands
4.4 Consistency Management.
4.5 Detecting Memory Objects Written by a Kernel
5 SnuCL extensions to OpenCL
6 Performance evaluation
6.1 Evaluation Methodology
6.2 Performance
6.2.1 Scalability on the medium-scale GPU cluster
6.2.2 Scalability on the large-scale CPU cluster
7 Conclusions
Chapter 3: Thread communication and synchronization on massively parallel GPUs
2 Coarse-Grained Communication and Synchronization
2.1 Global Barrier at the Kernel Level
2.2 Local Barrier at the Work-Group Level
2.3 Implicit Barrier at the Wavefront Level
3 Built-In Atomic Functions on Regular Variables
4 Fine-Grained Communication and Synchronization
4.1 Memory Consistency Model
4.1.1 Sequential consistency
4.1.2 Relaxed consistency
4.2 The OpenCL 2.0 Memory Model
4.2.1 Relationships between two memory operations
4.2.2 Special atomic operations and stand-alone memory fence
4.2.3 Release and acquire semantics
4.2.4 Memory order parameters
4.2.5 Memory scope parameters
5 Conclusion and Future Research Direction
Chapter 4: Software-level task scheduling on GPUs
1 Introduction, Problem Statement, and Context
2 Nondeterministic behaviors caused by the hardware
2.1 P1: Greedy
2.2 P2: Not Round-Robin
2.3 P3: Nondeterministic Across Runs
2.4 P4: Oblivious to Nonuniform Data Sharing
2.5 P5: Serializing Multikernel Co-Runs
3 SM-centric transformation
3.1 Core Ideas
3.1.1 SM-centric task selection
Correctness issues
3.1.2 Filling-retreating scheme
3.2 Implementation
3.2.1 Details
4 Scheduling-enabled optimizations
4.1 Parallelism Control
4.2 Affinity-Based Scheduling
4.2.1 Evaluation
4.3 SM Partitioning for Multi-Kernel Co-Runs
4.3.1 Evaluation
5 Other scheduling work on GPUs
5.1 Software Approaches.
5.2 Hardware Approaches
6 Conclusion and future work
Chapter 5: Data placement on GPUs
2 Overview
3 Memory specification through MSL
4 Compiler support
4.1 Deriving Data Access Patterns: A Hybrid Approach
4.1.1 Reuse distance model
4.1.2 Staging code to be agnostic to placement
5 Runtime support
5.1 Lightweight Performance Model
5.2 Finding the Best Placements
5.2.1 Dealing with phases
6 Results
6.1 Results With Irregular Benchmarks
6.2 Results With Regular Benchmarks
7 Related Work
8 Summary
Part 2: Algorithms and applications
Chapter 6: Biological sequence analysis on GPU
2 Pairwise Sequence Comparison and Sequence-Profile Comparison
2.1 Pairwise Sequence Comparison
2.1.1 Smith-Waterman algorithm
Phase 1: Create the similarity matrix
Phase 2: Obtain the best local alignment
Smith-Waterman variations
2.1.2 Basic Local Alignment Tool
Phase 1: Seeding
Phase 2: Extension
Phase 3: Evaluation
BLAST
2.2 Sequence-Profile Comparison
2.2.1 Hidden Markov models
2.2.2 The Viterbi algorithm
2.2.3 The MSV algorithm
3 Design Aspects of GPU Solutions for Biological Sequence Analysis
3.1 Task-Parallelism vs Data-Parallelism
3.2 Sorting Sequence Database to Achieve Load Balance
3.3 Use of GPU Memory Hierarchy
3.4 GPU Solution Used as a Filter
4 GPU Solutions for Pairwise Sequence Comparison
4.1 GPU Solutions Using Exact Algorithms
4.1.1 Manavski and Valle [27]
4.1.2 Ligowski and Rudnicki [38]
4.1.3 Blazwicz et al. [29]
4.1.4 Li et al. [52]
4.1.5 Ino et al. [28,30]
4.1.6 Liu et al. [45-47]
4.1.7 Sandes and de Melo [39-41] and Sandes et al. [42]
4.1.8 Comparative overview
4.2 GPU Solutions Using BLAST
4.2.1 Vouzis and Sahinidis [31].
4.2.2 Liu et al. [48]
4.2.3 Zhang et al. [43]
4.2.4 Comparative overview
5 GPU Solutions for Sequence-Profile Comparison
5.1 GPU Solutions Using the Viterbi Algorithm
5.1.1 Horn et al. [32]
5.1.2 Du et al. [44]
5.1.3 Walters et al. [33]
5.1.4 Yao et al. [34]
5.1.5 Ganesan et al. [49]
5.1.6 Ferraz and Moreano [36]
5.2 GPU Solutions Using the MSV Algorithm
5.2.1 Li et al. [35]
5.2.2 Cheng and Butler [37]
5.2.3 Araújo Neto and Moreano [50]
5.3 Comparative Overview
6 Conclusion and Perspectives
Chapter 7: Graph algorithms on GPUs
1 Graph representation for GPUs
1.1 Adjacency Matrices
1.2 Adjacency Lists
1.3 Edge Lists
2 Graph traversal algorithms: the breadth first search (BFS)
2.1 The Frontier-Based Parallel Implementation of BFS
2.2 BFS-4K
3 The single-source shortest path (SSSP) problem
3.1 The SSSP Implementations for GPUs
3.2 H-BF: An Efficient Implementation of the Bellman-Ford Algorithm
4 The APSP problem
4.1 The APSP Implementations for GPUs
5 Load Balancing and Memory Accesses: Issuesand Management Techniques
5.1 Static Mapping Techniques
5.1.1 Work-items to threads
5.1.2 Virtual warps
5.2 Semidynamic Mapping Techniques
5.2.1 Dynamic virtual warps + dynamic parallelism
5.2.2 CTA + warp + scan
5.3 Dynamic Mapping Techniques
5.3.1 Direct search
5.3.2 Local warp search
5.3.3 Block search
5.3.4 Two-phase search
5.4 The Multiphase Search Technique
5.5 Coalesced Expansion
5.6 Iterated Searches
Chapter 8: GPU alignment of two and three sequences
1.1 Pairwise alignment
1.2 Alignment of Three Sequences
2 GPU architecture
3 Pairwise alignment
3.1 Smith-Waterman Algorithm
3.2 Computing the Score of the Best Local Alignment.
3.3 Computing the Best Local Alignment
3.3.1 StripedAlignment
3.3.2 ChunkedAlignment1
3.3.3 ChunkedAlignment2
3.3.4 Memory requirements
3.4 Experimental Results
StripedScore
StripedAlignment
ChunkedAlignment1
4 Alignment of three sequences
4.1 Three-Sequence Alignment Algorithm
4.2 Computing the Score of the Best Alignment
4.2.1 Layered algorithm
GPU computational strategy
Analysis
4.2.2 Sloped algorithm
4.3 Computing the Best Alignment
4.3.1 Layered-BT1
4.3.2 Layered-BT2
4.3.3 Layered-BT3
4.4 Experimental Results
4.4.1 Computing the score of the best alignment
4.4.2 Computing the alignment
5 Conclusion
Chapter 9: Augmented block cimmino distributed algorithm for solving tridiagonal systems on GPU
2 ABCD Solver for tridiagonal systems
3 GPU implementation and optimization
3.1 QR Method and Givens Rotation
3.2 Sparse Storage Format
3.3 Coalesced Memory Access
3.4 Boundary Padding
3.4.1 Padding of the augmented matrix
3.4.2 Padding for Givens rotation
4 Performance evaluation
4.1 Comparison With CPU Implementation
4.2 Speedup by Memory Coalescing
4.3 Boundary Padding
5 Conclusion and future work
Chapter 10: GPU computing applied to linear and mixed-integer programming
2 Operations Research in Practice
3 Exact Optimization Algorithms
3.1 The Simplex Method
3.2 Dynamic Programming
3.2.1 Knapsack problems
3.2.2 Multiple-choice knapsack problem
3.3 Branch-and-Bound
3.3.1 Knapsack problem
3.3.2 Flow-shop scheduling problem
3.3.3 Traveling salesman problem
4 Metaheuristics
4.1 Genetic Algorithms
4.1.1 The traveling salesman problem
4.1.2 Scheduling problems
4.1.3 Knapsack problems.
4.2 Ant Colony Optimization.
Notes:
Includes index.
Includes bibliographical references and indexes.
Description based on online resource; title from PDF title page (ebrary, viewed October 5, 2016).
ISBN:
9780128037881
0128037881
OCLC:
960211532

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account