My Account Log in

1 option

High-performance computing on complex environments / Emmanuel Jeannot, Julius Zilinskas.

Ebook Central Academic Complete Available online

View online
Format:
Book
Author/Creator:
Jeannot, Emmanuel, author.
Žilinskas, J. (Julius), 1973- author.
Series:
Wiley series on parallel and distributed computing.
Wiley Series on Parallel and Distributed Computing
Language:
English
Subjects (All):
High performance computing.
Physical Description:
1 online resource (470 p.)
Edition:
1st ed.
Place of Publication:
Hoboken, New Jersey : Wiley, 2014.
Summary:
With recent changes in multicore and general-purpose computing on graphics processing units, the way parallel computers are used and programmed has drastically changed. It is important to provide a comprehensive study on how to use such machines written by specialists of the domain. The book provides recent research results in high-performance computing on complex environments, information on how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems, detailed studies on the impact of applying heterogeneous computing practices to real problems, and applications varying from remote sensing to tomography. The content spans topics such as Numerical Analysis for Heterogeneous and Multicore Systems; Optimization of Communication for High Performance Heterogeneous and Hierarchical Platforms; Efficient Exploitation of Heterogeneous Architectures, Hybrid CPU+GPU, and Distributed Systems; Energy Awareness in High-Performance Computing; and Applications of Heterogeneous High-Performance Computing. * Covers cutting-edge research in HPC on complex environments, following an international collaboration of members of the ComplexHPC * Explains how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems * Twenty-three chapters and over 100 illustrations cover domains such as numerical analysis, communication and storage, applications, GPUs and accelerators, and energy efficiency
Contents:
Cover
Title Page
Contents
Contributors
Preface
European Science Foundation
Part I Introduction
Chapter 1 Summary of the Open European Network for High-Performance Computing in Complex Environments
1.1 Introduction and Vision
1.2 Scientific Organization
1.2.1 Scientific Focus
1.2.2 Working Groups
1.3 Activities of the Project
1.3.1 Spring Schools
1.3.2 International Workshops
1.3.3 Working Groups Meetings
1.3.4 Management Committee Meetings
1.3.5 Short-Term Scientific Missions
1.4 Main Outcomes of the Action
1.5 Contents of the Book
Acknowledgment
Part II Numerical Analysis for Heterogeneous and Multicore Systems
Chapter 2 On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques
2.1 Introduction
2.2 General Description of Iterative Methods and Preconditioning
2.2.1 Basic Iterative Methods
2.2.2 Projection Methods: CG and GMRES
2.3 Preconditioning Techniques
2.4 Defect-Correction Technique
2.5 Multigrid Method
2.6 Parallelization of Iterative Methods
2.7 Heterogeneous Systems
2.7.1 Heterogeneous Computing
2.7.2 Algorithm Characteristics and Resource Utilization
2.7.3 Exposing Parallelism
2.7.4 Heterogeneity in Matrix Computation
2.7.5 Setup of Heterogeneous Iterative Solvers
2.8 Maintenance and Portability
2.9 Conclusion
Acknowledgments
References
Chapter 3 Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers
3.1 Introduction
3.2 Test Case
3.2.1 Governing Equations
3.2.2 Solution Procedure
3.3 Parallel Implementation
3.3.1 Intel PCM Library
3.3.2 OpenMP
3.4 Results
3.4.1 Results of Numerical Integration
3.4.2 Parallel Efficiency
3.5 Discussion
3.6 Conclusion
References.
Chapter 4 Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience
4.1 Introduction
4.2 Formulation of the Discrete Model
4.2.1 The theta-Implicit Discrete Scheme
4.2.2 The Predictor
Corrector Algorithm I
4.2.3 The Predictor
Corrector Algorithm II
4.3 Parallel Algorithms
4.3.1 Parallel theta-Implicit Algorithm
4.3.2 Parallel Predictor
4.3.3 Parallel Predictor
4.4 Computational Results
4.4.1 Experimental Comparison of Predictor
Corrector Algorithms
4.4.2 Numerical Experiment of Neuron Excitation
4.5 Conclusions
Part III Communication and Storage Considerations in High-Performance Computing
Chapter 5 An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing
5.1 Introduction
5.2 General Overview
5.2.1 A Key to Scalability: Data Locality
5.2.2 Data Locality Management in Parallel Programming Models
5.2.3 Virtual Topology: Definition and Characteristics
5.2.4 Understanding the Hardware
5.3 Formalization of the Problem
5.4 Algorithmic Strategies for Topology Mapping
5.4.1 Greedy Algorithm Variants
5.4.2 Graph Partitioning
5.4.3 Schemes Based on Graph Similarity
5.4.4 Schemes Based on Subgraph Isomorphism
5.5 Mapping Enforcement Techniques
5.5.1 Resource Binding
5.5.2 Rank Reordering
5.5.3 Other Techniques
5.6 Survey of Solutions
5.6.1 Algorithmic Solutions
5.6.2 Existing Implementations
5.7 Conclusion and Open Problems
Chapter 6 Optimization of Collective Communication for Heterogeneous HPC Platforms
6.1 Introduction
6.2 Overview of Optimized Collectives and Topology-Aware Collectives
6.3 Optimizations of Collectives on Homogeneous Clusters
6.4 Heterogeneous Networks.
6.4.1 Comparison to Homogeneous Clusters
6.5 Topology- and Performance-Aware Collectives
6.6 Topology as Input
6.7 Performance as Input
6.7.1 Homogeneous Performance Models
6.7.2 Heterogeneous Performance Models
6.7.3 Estimation of Parameters of Heterogeneous Performance Models
6.7.4 Other Performance Models
6.8 Non-MPI Collective Algorithms for Heterogeneous Networks
6.8.1 Optimal Solutions with Multiple Spanning Trees
6.8.2 Adaptive Algorithms for Efficient Large-Message Transfer
6.8.3 Network Models Inspired by BitTorrent
6.9 Conclusion
Chapter 7 Effective Data Access Patterns on Massively Parallel Processors
7.1 Introduction
7.2 Architectural Details
7.3 K-Model
7.3.1 The Architecture
7.3.2 Cost and Complexity Evaluation
7.3.3 Efficiency Evaluation
7.4 Parallel Prefix Sum
7.4.1 Experiments
7.5 Bitonic Sorting Networks
7.5.1 Experiments
7.6 Final Remarks
Chapter 8 Scalable Storage I/O Software for Blue Gene Architectures
8.1 Introduction
8.2 Blue Gene System Overview
8.2.1 Blue Gene Architecture
8.2.2 Operating System Architecture
8.3 Design and Implementation
8.3.1 The Client Module
8.3.2 The I/O Module
8.4 Conclusions and Future Work
Part IV Efficient Exploitation of Heterogeneous Architectures
Chapter 9 Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems
9.1 Introduction
9.1.1 Application Model
9.1.2 System Model
9.1.3 Performance Metrics
9.2 Concurrent Workflow Scheduling
9.2.1 Offline Scheduling of Concurrent Workflows
9.2.2 Online Scheduling of Concurrent Workflows
9.3 Experimental Results and Discussion
9.3.1 DAG Structure
9.3.2 Simulated Platforms
9.3.3 Results and Discussion.
9.4 Conclusions
Chapter 10 Systematic Mapping of Reed
Solomon Erasure Codes on Heterogeneous Multicore Architectures
10.1 Introduction
10.2 Related Works
10.3 Reed
Solomon Codes and Linear Algebra Algorithms
10.4 Mapping Reed
Solomon Codes on Cell/B.E. Architecture
10.4.1 Cell/B.E. Architecture
10.4.2 Basic Assumptions for Mapping
10.4.3 Vectorization Algorithm and Increasing its Efficiency
10.4.4 Performance Results
10.5 Mapping Reed
Solomon Codes on Multicore GPU Architectures
10.5.1 Parallelization of Reed
Solomon Codes on GPU Architectures
10.5.2 Organization of GPU Threads
10.6 Methods of Increasing the Algorithm Performance on GPUs
10.6.1 Basic Modifications
10.6.2 Stream Processing
10.6.3 Using Shared Memory
10.7 GPU Performance Evaluation
10.7.1 Experimental Results
10.7.2 Performance Analysis using the Roofline Model
10.8 Conclusions and Future Works
Chapter 11 Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study
11.1 Introduction
11.2 A Low-Cost Heterogeneous Computing Environment
11.2.1 Adopted Computing Environment
11.3 First Case Study: The N-Body Problem
11.3.1 The Sequential N-Body Algorithm
11.3.2 The Parallel N-Body Algorithm for Multicore Architectures
11.3.3 The Parallel N-Body Algorithm for CUDA Architectures
11.4 Second Case Study: The Convolution Algorithm
11.4.1 The Sequential Convolver Algorithm
11.4.2 The Parallel Convolver Algorithm for Multicore Architectures
11.4.3 The Parallel Convolver Algorithm for GPU Architectures
11.5 Conclusions
Chapter 12 Efficient Application of Hybrid Parallelism in Electromagnetism Problems
12.1 Introduction.
12.2 Computation of Green's functions in Hybrid Systems
12.2.1 Computation in a Heterogeneous Cluster
12.2.2 Experiments
12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique
12.3.1 Experiments
12.4 Autotuning Parallel Codes
12.4.1 Empirical Autotuning
12.4.2 Modeling the Linear Algebra Routines
12.5 Conclusions and Future Research
Part V CPU + GPU Coprocessing
Chapter 13 Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models
13.1 Introduction
13.2 Related Work
13.3 Data Partitioning Based on Functional Performance Model
13.4 Example Application: Heterogeneous Parallel Matrix Multiplication
13.5 Performance Measurement on CPUs/GPUs System
13.6 Functional Performance Models of Multiple Cores and GPUs
13.7 FPM-Based Data Partitioning on CPUs/GPUs System
13.8 Efficient Building of Functional Performance Models
13.9 FPM-Based Data Partitioning on Hierarchical Platforms
13.10 Conclusion
Chapter 14 Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems
14.1 Introduction: Heterogeneous CPU + GPU Systems
14.1.1 Open Problems and Specific Contributions
14.2 Background and Related Work
14.2.1 Divisible Load Scheduling in Distributed CPU-Only Systems
14.2.2 Scheduling in Multicore CPU and Multi-GPU Environments
14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems
14.3.1 Multilevel Simultaneous Load Balancing Algorithm
14.3.2 Algorithm for Multi-Installment Processing with Multidistributions
14.4 Experimental Results
14.4.1 MSLBA Evaluation: Dense Matrix Multiplication Case Study
14.4.2 AMPMD Evaluation: 2D FFT Case Study
14.5 Conclusions.
Acknowledgments.
Notes:
Includes bibliographical references at the end of each chapters and index.
Description based on print version record.
ISBN:
1-118-71189-0
1-118-86667-3
OCLC:
870336445

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account