1 option
High-performance computing on complex environments / Emmanuel Jeannot, Julius Zilinskas.
- Format:
- Book
- Author/Creator:
- Jeannot, Emmanuel, author.
- Žilinskas, J. (Julius), 1973- author.
- Series:
- Wiley series on parallel and distributed computing.
- Wiley Series on Parallel and Distributed Computing
- Language:
- English
- Subjects (All):
- High performance computing.
- Physical Description:
- 1 online resource (470 p.)
- Edition:
- 1st ed.
- Place of Publication:
- Hoboken, New Jersey : Wiley, 2014.
- Summary:
- With recent changes in multicore and general-purpose computing on graphics processing units, the way parallel computers are used and programmed has drastically changed. It is important to provide a comprehensive study on how to use such machines written by specialists of the domain. The book provides recent research results in high-performance computing on complex environments, information on how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems, detailed studies on the impact of applying heterogeneous computing practices to real problems, and applications varying from remote sensing to tomography. The content spans topics such as Numerical Analysis for Heterogeneous and Multicore Systems; Optimization of Communication for High Performance Heterogeneous and Hierarchical Platforms; Efficient Exploitation of Heterogeneous Architectures, Hybrid CPU+GPU, and Distributed Systems; Energy Awareness in High-Performance Computing; and Applications of Heterogeneous High-Performance Computing. * Covers cutting-edge research in HPC on complex environments, following an international collaboration of members of the ComplexHPC * Explains how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems * Twenty-three chapters and over 100 illustrations cover domains such as numerical analysis, communication and storage, applications, GPUs and accelerators, and energy efficiency
- Contents:
- Cover
- Title Page
- Contents
- Contributors
- Preface
- European Science Foundation
- Part I Introduction
- Chapter 1 Summary of the Open European Network for High-Performance Computing in Complex Environments
- 1.1 Introduction and Vision
- 1.2 Scientific Organization
- 1.2.1 Scientific Focus
- 1.2.2 Working Groups
- 1.3 Activities of the Project
- 1.3.1 Spring Schools
- 1.3.2 International Workshops
- 1.3.3 Working Groups Meetings
- 1.3.4 Management Committee Meetings
- 1.3.5 Short-Term Scientific Missions
- 1.4 Main Outcomes of the Action
- 1.5 Contents of the Book
- Acknowledgment
- Part II Numerical Analysis for Heterogeneous and Multicore Systems
- Chapter 2 On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques
- 2.1 Introduction
- 2.2 General Description of Iterative Methods and Preconditioning
- 2.2.1 Basic Iterative Methods
- 2.2.2 Projection Methods: CG and GMRES
- 2.3 Preconditioning Techniques
- 2.4 Defect-Correction Technique
- 2.5 Multigrid Method
- 2.6 Parallelization of Iterative Methods
- 2.7 Heterogeneous Systems
- 2.7.1 Heterogeneous Computing
- 2.7.2 Algorithm Characteristics and Resource Utilization
- 2.7.3 Exposing Parallelism
- 2.7.4 Heterogeneity in Matrix Computation
- 2.7.5 Setup of Heterogeneous Iterative Solvers
- 2.8 Maintenance and Portability
- 2.9 Conclusion
- Acknowledgments
- References
- Chapter 3 Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers
- 3.1 Introduction
- 3.2 Test Case
- 3.2.1 Governing Equations
- 3.2.2 Solution Procedure
- 3.3 Parallel Implementation
- 3.3.1 Intel PCM Library
- 3.3.2 OpenMP
- 3.4 Results
- 3.4.1 Results of Numerical Integration
- 3.4.2 Parallel Efficiency
- 3.5 Discussion
- 3.6 Conclusion
- References.
- Chapter 4 Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience
- 4.1 Introduction
- 4.2 Formulation of the Discrete Model
- 4.2.1 The theta-Implicit Discrete Scheme
- 4.2.2 The Predictor
- Corrector Algorithm I
- 4.2.3 The Predictor
- Corrector Algorithm II
- 4.3 Parallel Algorithms
- 4.3.1 Parallel theta-Implicit Algorithm
- 4.3.2 Parallel Predictor
- 4.3.3 Parallel Predictor
- 4.4 Computational Results
- 4.4.1 Experimental Comparison of Predictor
- Corrector Algorithms
- 4.4.2 Numerical Experiment of Neuron Excitation
- 4.5 Conclusions
- Part III Communication and Storage Considerations in High-Performance Computing
- Chapter 5 An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing
- 5.1 Introduction
- 5.2 General Overview
- 5.2.1 A Key to Scalability: Data Locality
- 5.2.2 Data Locality Management in Parallel Programming Models
- 5.2.3 Virtual Topology: Definition and Characteristics
- 5.2.4 Understanding the Hardware
- 5.3 Formalization of the Problem
- 5.4 Algorithmic Strategies for Topology Mapping
- 5.4.1 Greedy Algorithm Variants
- 5.4.2 Graph Partitioning
- 5.4.3 Schemes Based on Graph Similarity
- 5.4.4 Schemes Based on Subgraph Isomorphism
- 5.5 Mapping Enforcement Techniques
- 5.5.1 Resource Binding
- 5.5.2 Rank Reordering
- 5.5.3 Other Techniques
- 5.6 Survey of Solutions
- 5.6.1 Algorithmic Solutions
- 5.6.2 Existing Implementations
- 5.7 Conclusion and Open Problems
- Chapter 6 Optimization of Collective Communication for Heterogeneous HPC Platforms
- 6.1 Introduction
- 6.2 Overview of Optimized Collectives and Topology-Aware Collectives
- 6.3 Optimizations of Collectives on Homogeneous Clusters
- 6.4 Heterogeneous Networks.
- 6.4.1 Comparison to Homogeneous Clusters
- 6.5 Topology- and Performance-Aware Collectives
- 6.6 Topology as Input
- 6.7 Performance as Input
- 6.7.1 Homogeneous Performance Models
- 6.7.2 Heterogeneous Performance Models
- 6.7.3 Estimation of Parameters of Heterogeneous Performance Models
- 6.7.4 Other Performance Models
- 6.8 Non-MPI Collective Algorithms for Heterogeneous Networks
- 6.8.1 Optimal Solutions with Multiple Spanning Trees
- 6.8.2 Adaptive Algorithms for Efficient Large-Message Transfer
- 6.8.3 Network Models Inspired by BitTorrent
- 6.9 Conclusion
- Chapter 7 Effective Data Access Patterns on Massively Parallel Processors
- 7.1 Introduction
- 7.2 Architectural Details
- 7.3 K-Model
- 7.3.1 The Architecture
- 7.3.2 Cost and Complexity Evaluation
- 7.3.3 Efficiency Evaluation
- 7.4 Parallel Prefix Sum
- 7.4.1 Experiments
- 7.5 Bitonic Sorting Networks
- 7.5.1 Experiments
- 7.6 Final Remarks
- Chapter 8 Scalable Storage I/O Software for Blue Gene Architectures
- 8.1 Introduction
- 8.2 Blue Gene System Overview
- 8.2.1 Blue Gene Architecture
- 8.2.2 Operating System Architecture
- 8.3 Design and Implementation
- 8.3.1 The Client Module
- 8.3.2 The I/O Module
- 8.4 Conclusions and Future Work
- Part IV Efficient Exploitation of Heterogeneous Architectures
- Chapter 9 Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems
- 9.1 Introduction
- 9.1.1 Application Model
- 9.1.2 System Model
- 9.1.3 Performance Metrics
- 9.2 Concurrent Workflow Scheduling
- 9.2.1 Offline Scheduling of Concurrent Workflows
- 9.2.2 Online Scheduling of Concurrent Workflows
- 9.3 Experimental Results and Discussion
- 9.3.1 DAG Structure
- 9.3.2 Simulated Platforms
- 9.3.3 Results and Discussion.
- 9.4 Conclusions
- Chapter 10 Systematic Mapping of Reed
- Solomon Erasure Codes on Heterogeneous Multicore Architectures
- 10.1 Introduction
- 10.2 Related Works
- 10.3 Reed
- Solomon Codes and Linear Algebra Algorithms
- 10.4 Mapping Reed
- Solomon Codes on Cell/B.E. Architecture
- 10.4.1 Cell/B.E. Architecture
- 10.4.2 Basic Assumptions for Mapping
- 10.4.3 Vectorization Algorithm and Increasing its Efficiency
- 10.4.4 Performance Results
- 10.5 Mapping Reed
- Solomon Codes on Multicore GPU Architectures
- 10.5.1 Parallelization of Reed
- Solomon Codes on GPU Architectures
- 10.5.2 Organization of GPU Threads
- 10.6 Methods of Increasing the Algorithm Performance on GPUs
- 10.6.1 Basic Modifications
- 10.6.2 Stream Processing
- 10.6.3 Using Shared Memory
- 10.7 GPU Performance Evaluation
- 10.7.1 Experimental Results
- 10.7.2 Performance Analysis using the Roofline Model
- 10.8 Conclusions and Future Works
- Chapter 11 Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study
- 11.1 Introduction
- 11.2 A Low-Cost Heterogeneous Computing Environment
- 11.2.1 Adopted Computing Environment
- 11.3 First Case Study: The N-Body Problem
- 11.3.1 The Sequential N-Body Algorithm
- 11.3.2 The Parallel N-Body Algorithm for Multicore Architectures
- 11.3.3 The Parallel N-Body Algorithm for CUDA Architectures
- 11.4 Second Case Study: The Convolution Algorithm
- 11.4.1 The Sequential Convolver Algorithm
- 11.4.2 The Parallel Convolver Algorithm for Multicore Architectures
- 11.4.3 The Parallel Convolver Algorithm for GPU Architectures
- 11.5 Conclusions
- Chapter 12 Efficient Application of Hybrid Parallelism in Electromagnetism Problems
- 12.1 Introduction.
- 12.2 Computation of Green's functions in Hybrid Systems
- 12.2.1 Computation in a Heterogeneous Cluster
- 12.2.2 Experiments
- 12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique
- 12.3.1 Experiments
- 12.4 Autotuning Parallel Codes
- 12.4.1 Empirical Autotuning
- 12.4.2 Modeling the Linear Algebra Routines
- 12.5 Conclusions and Future Research
- Part V CPU + GPU Coprocessing
- Chapter 13 Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models
- 13.1 Introduction
- 13.2 Related Work
- 13.3 Data Partitioning Based on Functional Performance Model
- 13.4 Example Application: Heterogeneous Parallel Matrix Multiplication
- 13.5 Performance Measurement on CPUs/GPUs System
- 13.6 Functional Performance Models of Multiple Cores and GPUs
- 13.7 FPM-Based Data Partitioning on CPUs/GPUs System
- 13.8 Efficient Building of Functional Performance Models
- 13.9 FPM-Based Data Partitioning on Hierarchical Platforms
- 13.10 Conclusion
- Chapter 14 Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems
- 14.1 Introduction: Heterogeneous CPU + GPU Systems
- 14.1.1 Open Problems and Specific Contributions
- 14.2 Background and Related Work
- 14.2.1 Divisible Load Scheduling in Distributed CPU-Only Systems
- 14.2.2 Scheduling in Multicore CPU and Multi-GPU Environments
- 14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems
- 14.3.1 Multilevel Simultaneous Load Balancing Algorithm
- 14.3.2 Algorithm for Multi-Installment Processing with Multidistributions
- 14.4 Experimental Results
- 14.4.1 MSLBA Evaluation: Dense Matrix Multiplication Case Study
- 14.4.2 AMPMD Evaluation: 2D FFT Case Study
- 14.5 Conclusions.
- Acknowledgments.
- Notes:
- Includes bibliographical references at the end of each chapters and index.
- Description based on print version record.
- ISBN:
- 1-118-71189-0
- 1-118-86667-3
- OCLC:
- 870336445
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.