1 option

Advances in GPU research and practice / edited by Hamid Sarbazi-Azad.

O'Reilly Online Learning: Academic/Public Library Edition Available online

Format:: Book
Author/Creator:: Sarbazi-Azad, Hamid, author.
Contributor:: Sarbazi-Azad, Hamid, editor.
Series:: Emerging trends in computer science & applied computing.; Emerging trends in computer science and applied computing
Language:: English
Subjects (All):: Graphics processing units--Programming.; Graphics processing units.; Imaging systems.; Computer graphics.; Image processing--Digital techniques.; Image processing.
Physical Description:: 1 online resource (776 pages) : illustrations (some color)
Edition:: First edition.
Place of Publication:: Amsterdam : Elsevier, [2017]
System Details:: text file
Summary:: Advances in GPU Research and Practice focuses on research and practices in GPU based systems. The topics treated cover a range of issues, ranging from hardware and architectural issues, to high level issues, such as application systems, parallel programming, middleware, and power and energy issues. Divided into six parts, this edited volume provides the latest research on GPU computing. Part I: Architectural Solutions focuses on the architectural topics that improve on performance of GPUs, Part II: System Software discusses OS, compilers, libraries, programming environment, languages, and paradigms that are proposed and analyzed to help and support GPU programmers. Part III: Power and Reliability Issues covers different aspects of energy, power, and reliability concerns in GPUs. Part IV: Performance Analysis illustrates mathematical and analytical techniques to predict different performance metrics in GPUs. Part V: Algorithms presents how to design efficient algorithms and analyze their complexity for GPUs. Part VI: Applications and Related Topics provides use cases and examples of how GPUs are used across many sectors. Discusses how to maximize power and obtain peak reliability when designing, building, and using GPUs Covers system software (OS, compilers), programming environments, languages, and paradigms proposed to help and support GPU programmers Explains how to use mathematical and analytical techniques to predict different performance metrics in GPUs Illustrates the design of efficient GPU algorithms in areas such as bioinformatics, complex systems, social networks, and cryptography Provides applications and use case scenarios in several different verticals, including medicine, social sciences, image processing, and telecommunications
Contents:: Front Cover; Advances in GPU Research and Practice; Copyright; Dedication; Contents; List of Contributors; Preface; Acknowledgments; Part 1: Programming and tools; Chapter 1: Formal analysis techniques for reliable GPU programming: current solutions and call to action; 1 GPUs in Support of Parallel Computing; Bugs in parallel and GPU code; 2 A quick introduction to GPUs; Organization of threads; Memory spaces; Barrier synchronization; Warps and lock-step execution; Dot product example; 3 Correctness issues in GPU programming; Data races; Lack of forward progress guarantees; Floating-point accuracy; 4 The need for effective tools; 4.1 A Taxonomy of Current Tools; 4.2 Canonical Schedules and the Two-Thread Reduction; Race freedom implies determinism; Detecting races: ``all for one and one for all''; Restricting to a canonical schedule; Reduction to a pair of threads; 4.3 Symbolic Bug-Finding Case Study: GKLEE; 4.4 Verification Case Study: GPUVerify; 5 Call to Action; GPUs will become more pervasive; Current tools show promise; Solving basic correctness issues; Equivalence checking; Clarity from vendors and standards bodies; User validation of tools; References; Chapter 2: SnuCL: A unified OpenCL framework for heterogeneous clusters; 1 Introduction; 2 OpenCL; 2.1 Platform Model; 2.2 Execution Model; 2.3 Memory Model; 2.4 Synchronization; 2.5 Memory Consistency; 2.6 OpenCL ICD; 3 Overview of SnuCL framework; 3.1 Limitations of OpenCL; 3.2 SnuCL CPU; 3.3 SnuCL Single; 3.4 SnuCL Cluster; 3.4.1 Processing synchronization commands; 4 Memory management in SnuCL Cluster; 4.1 Space Allocation to Memory Objects; 4.2 Minimizing Copying Overhead; 4.3 Processing Memory Commands; 4.4 Consistency Management.; 4.5 Detecting Memory Objects Written by a Kernel; 5 SnuCL extensions to OpenCL; 6 Performance evaluation; 6.1 Evaluation Methodology; 6.2 Performance; 6.2.1 Scalability on the medium-scale GPU cluster; 6.2.2 Scalability on the large-scale CPU cluster; 7 Conclusions; Chapter 3: Thread communication and synchronization on massively parallel GPUs; 2 Coarse-Grained Communication and Synchronization; 2.1 Global Barrier at the Kernel Level; 2.2 Local Barrier at the Work-Group Level; 2.3 Implicit Barrier at the Wavefront Level; 3 Built-In Atomic Functions on Regular Variables; 4 Fine-Grained Communication and Synchronization; 4.1 Memory Consistency Model; 4.1.1 Sequential consistency; 4.1.2 Relaxed consistency; 4.2 The OpenCL 2.0 Memory Model; 4.2.1 Relationships between two memory operations; 4.2.2 Special atomic operations and stand-alone memory fence; 4.2.3 Release and acquire semantics; 4.2.4 Memory order parameters; 4.2.5 Memory scope parameters; 5 Conclusion and Future Research Direction; Chapter 4: Software-level task scheduling on GPUs; 1 Introduction, Problem Statement, and Context; 2 Nondeterministic behaviors caused by the hardware; 2.1 P1: Greedy; 2.2 P2: Not Round-Robin; 2.3 P3: Nondeterministic Across Runs; 2.4 P4: Oblivious to Nonuniform Data Sharing; 2.5 P5: Serializing Multikernel Co-Runs; 3 SM-centric transformation; 3.1 Core Ideas; 3.1.1 SM-centric task selection; Correctness issues; 3.1.2 Filling-retreating scheme; 3.2 Implementation; 3.2.1 Details; 4 Scheduling-enabled optimizations; 4.1 Parallelism Control; 4.2 Affinity-Based Scheduling; 4.2.1 Evaluation; 4.3 SM Partitioning for Multi-Kernel Co-Runs; 4.3.1 Evaluation; 5 Other scheduling work on GPUs; 5.1 Software Approaches.; 5.2 Hardware Approaches; 6 Conclusion and future work; Chapter 5: Data placement on GPUs; 2 Overview; 3 Memory specification through MSL; 4 Compiler support; 4.1 Deriving Data Access Patterns: A Hybrid Approach; 4.1.1 Reuse distance model; 4.1.2 Staging code to be agnostic to placement; 5 Runtime support; 5.1 Lightweight Performance Model; 5.2 Finding the Best Placements; 5.2.1 Dealing with phases; 6 Results; 6.1 Results With Irregular Benchmarks; 6.2 Results With Regular Benchmarks; 7 Related Work; 8 Summary; Part 2: Algorithms and applications; Chapter 6: Biological sequence analysis on GPU; 2 Pairwise Sequence Comparison and Sequence-Profile Comparison; 2.1 Pairwise Sequence Comparison; 2.1.1 Smith-Waterman algorithm; Phase 1: Create the similarity matrix; Phase 2: Obtain the best local alignment; Smith-Waterman variations; 2.1.2 Basic Local Alignment Tool; Phase 1: Seeding; Phase 2: Extension; Phase 3: Evaluation; BLAST; 2.2 Sequence-Profile Comparison; 2.2.1 Hidden Markov models; 2.2.2 The Viterbi algorithm; 2.2.3 The MSV algorithm; 3 Design Aspects of GPU Solutions for Biological Sequence Analysis; 3.1 Task-Parallelism vs Data-Parallelism; 3.2 Sorting Sequence Database to Achieve Load Balance; 3.3 Use of GPU Memory Hierarchy; 3.4 GPU Solution Used as a Filter; 4 GPU Solutions for Pairwise Sequence Comparison; 4.1 GPU Solutions Using Exact Algorithms; 4.1.1 Manavski and Valle [27]; 4.1.2 Ligowski and Rudnicki [38]; 4.1.3 Blazwicz et al. [29]; 4.1.4 Li et al. [52]; 4.1.5 Ino et al. [28,30]; 4.1.6 Liu et al. [45-47]; 4.1.7 Sandes and de Melo [39-41] and Sandes et al. [42]; 4.1.8 Comparative overview; 4.2 GPU Solutions Using BLAST; 4.2.1 Vouzis and Sahinidis [31].; 4.2.2 Liu et al. [48]; 4.2.3 Zhang et al. [43]; 4.2.4 Comparative overview; 5 GPU Solutions for Sequence-Profile Comparison; 5.1 GPU Solutions Using the Viterbi Algorithm; 5.1.1 Horn et al. [32]; 5.1.2 Du et al. [44]; 5.1.3 Walters et al. [33]; 5.1.4 Yao et al. [34]; 5.1.5 Ganesan et al. [49]; 5.1.6 Ferraz and Moreano [36]; 5.2 GPU Solutions Using the MSV Algorithm; 5.2.1 Li et al. [35]; 5.2.2 Cheng and Butler [37]; 5.2.3 Araújo Neto and Moreano [50]; 5.3 Comparative Overview; 6 Conclusion and Perspectives; Chapter 7: Graph algorithms on GPUs; 1 Graph representation for GPUs; 1.1 Adjacency Matrices; 1.2 Adjacency Lists; 1.3 Edge Lists; 2 Graph traversal algorithms: the breadth first search (BFS); 2.1 The Frontier-Based Parallel Implementation of BFS; 2.2 BFS-4K; 3 The single-source shortest path (SSSP) problem; 3.1 The SSSP Implementations for GPUs; 3.2 H-BF: An Efficient Implementation of the Bellman-Ford Algorithm; 4 The APSP problem; 4.1 The APSP Implementations for GPUs; 5 Load Balancing and Memory Accesses: Issuesand Management Techniques; 5.1 Static Mapping Techniques; 5.1.1 Work-items to threads; 5.1.2 Virtual warps; 5.2 Semidynamic Mapping Techniques; 5.2.1 Dynamic virtual warps + dynamic parallelism; 5.2.2 CTA + warp + scan; 5.3 Dynamic Mapping Techniques; 5.3.1 Direct search; 5.3.2 Local warp search; 5.3.3 Block search; 5.3.4 Two-phase search; 5.4 The Multiphase Search Technique; 5.5 Coalesced Expansion; 5.6 Iterated Searches; Chapter 8: GPU alignment of two and three sequences; 1.1 Pairwise alignment; 1.2 Alignment of Three Sequences; 2 GPU architecture; 3 Pairwise alignment; 3.1 Smith-Waterman Algorithm; 3.2 Computing the Score of the Best Local Alignment.; 3.3 Computing the Best Local Alignment; 3.3.1 StripedAlignment; 3.3.2 ChunkedAlignment1; 3.3.3 ChunkedAlignment2; 3.3.4 Memory requirements; 3.4 Experimental Results; StripedScore; StripedAlignment; ChunkedAlignment1; 4 Alignment of three sequences; 4.1 Three-Sequence Alignment Algorithm; 4.2 Computing the Score of the Best Alignment; 4.2.1 Layered algorithm; GPU computational strategy; Analysis; 4.2.2 Sloped algorithm; 4.3 Computing the Best Alignment; 4.3.1 Layered-BT1; 4.3.2 Layered-BT2; 4.3.3 Layered-BT3; 4.4 Experimental Results; 4.4.1 Computing the score of the best alignment; 4.4.2 Computing the alignment; 5 Conclusion; Chapter 9: Augmented block cimmino distributed algorithm for solving tridiagonal systems on GPU; 2 ABCD Solver for tridiagonal systems; 3 GPU implementation and optimization; 3.1 QR Method and Givens Rotation; 3.2 Sparse Storage Format; 3.3 Coalesced Memory Access; 3.4 Boundary Padding; 3.4.1 Padding of the augmented matrix; 3.4.2 Padding for Givens rotation; 4 Performance evaluation; 4.1 Comparison With CPU Implementation; 4.2 Speedup by Memory Coalescing; 4.3 Boundary Padding; 5 Conclusion and future work; Chapter 10: GPU computing applied to linear and mixed-integer programming; 2 Operations Research in Practice; 3 Exact Optimization Algorithms; 3.1 The Simplex Method; 3.2 Dynamic Programming; 3.2.1 Knapsack problems; 3.2.2 Multiple-choice knapsack problem; 3.3 Branch-and-Bound; 3.3.1 Knapsack problem; 3.3.2 Flow-shop scheduling problem; 3.3.3 Traveling salesman problem; 4 Metaheuristics; 4.1 Genetic Algorithms; 4.1.1 The traveling salesman problem; 4.1.2 Scheduling problems; 4.1.3 Knapsack problems.; 4.2 Ant Colony Optimization.
Notes:: Includes index.; Includes bibliographical references and indexes.; Description based on online resource; title from PDF title page (ebrary, viewed October 5, 2016).
ISBN:: 9780128037881; 0128037881
OCLC:: 960211532

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

1 option

Advances in GPU research and practice / edited by Hamid Sarbazi-Azad.

Find

My Account

Guides