1 option
High performance parallelism pearls. Volume two : multicore and many-core programming approaches / James Reinders, Jim Jeffers ; contributors, Jefferson Amstutz [and seventy one others].
- Format:
- Book
- Author/Creator:
- Reinders, James, author.
- Jeffers, Jim, author.
- Language:
- English
- Subjects (All):
- Parallel programming (Computer science)--Data processing.
- Parallel programming (Computer science).
- Coprocessors.
- Computer programming.
- Physical Description:
- 1 online resource (574 p.)
- Edition:
- First edition.
- Other Title:
- Multicore and many-core programming approaches
- Place of Publication:
- Amsterdam, [Netherlands] : Morgan Kaufmann, 2015.
- Language Note:
- English
- System Details:
- text file
- Summary:
- High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage parallelism. Similar to Volume 1, the techniques included here explain how to use processors and coprocessors with the same programming - illustrating the most effective ways to combine Xeon Phi coprocessors with Xeon and other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as biomed, genetics, finance, manufacturing, imaging, and more. Each chapter in this edited work includes detailed explanations of t
- Contents:
- Front Cover; High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches; Copyright; Contents; Contributors; Acknowledgments; Foreword; Making a bet on many-core; 2013 Stampede-Intel Many-Core System - A First; HPC journey and revelation; Stampede users discover: Its parallel programming; This book is timely and important; Preface; Inspired by 61 cores: A new era in programming; Chapter 1: Introduction; Applications and techniques; SIMD and vectorization; OpenMP and nested parallelism; Latency optimizations; Python; Streams; Ray tracing; Tuning prefetching
- MPI shared memoryUsing every last core; OpenCL vs. OpenMP; Power analysis for nodes and clusters; The future of many-core; Downloads; For more information; Chapter 2: Numerical Weather Prediction Optimization; Numerical weather prediction: Background and motivation; WSM6 in the NIM; Shared-memory parallelism and controlling horizontal vector length; Array alignment; Loop restructuring; Compile-time constants for loop and array bounds; Performance improvements; Summary; For more information; Chapter 3: WRF Goddard Microphysics Scheme Optimization; The motivation and background
- WRF Goddard microphysics schemeGoddard microphysics scheme; Benchmark setup; Code optimization; Removal of the vertical dimension from temporary variables for a reduced memory footprint; Collapse i- and j-loops into smaller cells for smaller footprint per thread; Addition of vector alignment directives; Summary of the code optimizations; Analysis using an instruction Mix report; VTune performance metrics; Performance effects of the optimization of Goddard microphysics scheme on the WRF; Summary; Acknowledgments; For more information; Chapter 4: Pairwise DNA Sequence Alignment Optimization
- Pairwise sequence alignmentParallelization on a single coprocessor; Multi-threading using OpenMP; Vectorization using SIMD intrinsics; Parallelization across multiple coprocessors using MPI; Performance results; Summary; For more information; Chapter 5: Accelerated Structural Bioinformatics for Drug Discovery; Parallelism enables proteome-scale structural bioinformatics; Overview of eFindSite; Benchmarking dataset; Code profiling; Porting eFindSite for coprocessor offload; Parallel version for a multicore processor; Task-level scheduling for processor and coprocessor; Case study; Summary
- For more informationChapter 6: Amber PME Molecular Dynamics Optimization; Theory of MD; Acceleration of neighbor list building using the coprocessor; Acceleration of direct space sum using the coprocessor; Additional optimizations in coprocessor code; Removing locks whenever possible; Exclusion list optimization; Reduce data transfer and computation in offload code; Modification of load balance algorithm; PME direct space sum and neighbor list work; PME reciprocal space sum work; Bonded force work; Compiler optimization flags; Results; Conclusions; For more information
- Chapter 7: Low-Latency Solutions for Financial Services Applications
- Notes:
- Includes indexes.
- Description based on online resource; title from PDF title page (ebrary, viewed August 8, 2015).
- ISBN:
- 9780128038901
- 012803890X
- 9780128038192
- 0128038195
- OCLC:
- 921845676
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.