1 option

Euro-Par 2024: Parallel Processing : 30th European Conference on Parallel and Distributed Processing, Madrid, Spain, August 26–30, 2024, Proceedings, Part II / edited by Jesus Carretero, Sameer Shende, Javier Garcia-Blas, Ivona Brandic, Katzalin Olcoz, Martin Schreiber.

SpringerLink Books Lecture Notes In Computer Science (LNCS) (1997-2024) Available online

Format:: Book
Author/Creator:: Carretero, Jesus.
Contributor:: Shende, Sameer.; Garcia-Blas, Javier.; Brandic, Ivona.; Olcoz, Katzalin.; Schreiber, Martin.
Series:: Lecture Notes in Computer Science, 1611-3349 ; 14802
Language:: English
Subjects (All):: Software engineering.; Microprogramming.; Computer input-output equipment.; Microprocessors.; Computer architecture.; Computer networks.; Computers, Special purpose.; Software Engineering.; Control Structures and Microprogramming.; Input/Output and Data Communications.; Processor Architectures.; Computer Communication Networks.; Special Purpose and Application-Based Systems.
Local Subjects:: Software Engineering.; Control Structures and Microprogramming.; Input/Output and Data Communications.; Processor Architectures.; Computer Communication Networks.; Special Purpose and Application-Based Systems.
Physical Description:: 1 online resource (520 pages)
Edition:: 1st ed. 2024.
Place of Publication:: Cham : Springer Nature Switzerland : Imprint: Springer, 2024.
Summary:: The three-volume set LNCS 14801, 14802, and 14803 constitutes the proceedings of the 30th European Conference on Parallel and Distributed Processing, Euro-Par 2024, which took place in Madrid, Spain, during August 26–30, 2024. The 88 full papers included in the proceedings were carefully reviewed and selected from 293 submissions. They were organized in topical sections as follows: Part I: Programmind, compilers, and performance; scheduling, resource management, cloud, edge computing, and workflows; Part II: Architectures and accelerators; data analytics, AI and computational science; Part III: Theory and algorithms; multidisciplinary, domain-specific and applied parallel and distributed computing.
Contents:: Intro; Preface; Organization; Contents - Part II; Architectures and Accelerators; Efficient RNIC Cache Side-Channel Attack Detection Through DPU-Driven Architecture; 1 Introduction; 2 Threat Model; 3 Background; 3.1 RNIC Cache Side-Channel Attack; 3.2 Switch-Centric RCSCA Detection; 3.3 DPU Characterization on Network and Compute; 4 DPU-Driven RCSCA Detector; 4.1 Design Overview; 4.2 Implementation Strategies of RCSCA Detector; 5 Evaluation Method and Results; 5.1 The Performance of Switch-Centric RCSCA Detector; 5.2 The Performance of DPU-Driven RCSCA Detector; 5.3 Performance Overhead and FPGA Resource Consumption; 6 Related Work; 7 Conclusion; References; Parallel Writing of Nested Data in Columnar Formats; 2 Related Work; 3 RNTuple Overview; 4 Concepts for Parallel Writing of Columnar Data; 4.1 Serialization and Compression; 4.2 Writing Into Container Format; 4.3 Updating Format Metadata; 5 Implementation of Parallel Writing in RNTuple; 6 Evaluation of Parallel RNTuple Writing; 6.1 Scalability on Different Storage Media; 6.2 Dataset Skimming of the Analysis Grand Challenge; 7 Conclusions and Future Work; FakeGuard: Novel Architecture Support for Deepfake Detection Networks; 2 Background and Motivation; 2.1 Understanding Deepfake Detection Methods; 2.2 Inefficiency of GPUs and Domain-Specific Accelerators; 3 FakeGuard Architecture; 3.1 Proposed Overall Architecture; 3.2 Hardware Architecture of SA-DPE; 3.3 Detailed Fused PE Design; 3.4 Reconfigurable Adder-Tree Topology; 4 Tile-Level Scheduling Mechanism; 4.1 Underutilization and Imbalanced Use of Hardware Resources; 4.2 Tile Segmentation and Execution Order Adjustment; 4.3 Tile-Level Task Preemption; 5 Evaluation; 5.1 Methodology.; 5.2 Experimental Results; 6 Conclusion; Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels; 2 Background and Related Work; 3 Methodology; 3.1 Processor Core Modelling; 3.2 Relevant Design Space Region; 3.3 Observables; 3.4 Micro-kernel Code Generation; 4 Results; 4.1 Comparison Between Model and Existing Hardware; 4.2 Comparison Arm SVE Versus RISC-V RVV; 5 Conclusions and Outlook; Fault Tolerant in the Expand Ad-Hoc Parallel File System; 2 State of the Art; 3 Expand Ad-Hoc; 3.1 Expand Ad-Hoc Design; 3.2 Metadata Management; 3.3 Parallel Access; 3.4 Data Locality; 3.5 System Call Interception Library; 4 Expand Ad-Hoc Fault Tolerance Model; 4.1 Metadata Management; 4.2 Read Optimizations; 5.1 IOR Evaluation; 5.2 DLIO Evaluation; 5.3 Real Deep Learning Application Evaluation; 6 Conclusions and Future Works; ImSPU: Implicit Sharing of Computation Resources Between Vector and Scalar Processing Units; 3 Implicit Sharing Architecture; 3.1 Implicit Sharing Across Different ISAs; 3.2 Datapath for Implicit Sharing; 4 Hardware Implementation; 4.1 Configuration; 4.2 Memory Interface; 4.3 Implicit Sharing of Vector Instructions; 4.4 Management of SPUs; 5 Results; 5.1 Synthesis Results; 5.2 Performance Evaluation; 5.3 Comparison with Previous Works; 6 Conclusions; ADE-HGNN: Accelerating HGNNs Through Attention Disparity Exploitation; 2 Background; 2.1 Heterogeneous Graph and Semantic Graph; 2.2 Heterogeneous Graph Neural Networks; 3 Motivation; 3.1 Attention Disparity; 3.2 Challenge to Exploit the Opportunity; 4 Optimized HGNN Execution Flow.; 4.1 Decomposition of Attention Computation; 4.2 Neighbor Pruning Method Based on Min-Heap; 4.3 Parallel Execution with Operation Fusion; 5 Architecture Design; 5.1 Hardware Components; 5.2 Design of Pruner; 6 Experimental Results; 6.1 Methodology; 6.2 Overall Results; 6.3 Effects of Proposed Optimizations; 7 Related Work; 8 Conclusion; Watt: A Write-Optimized RRAM-Based Accelerator for Attention; 2 Preliminary and Motivation; 2.1 Attention; 2.2 Resistive Random Access Memory; 2.3 Related Works; 2.4 Motivation; 3 Architecture Design; 3.1 Overall Architecture; 3.2 Importance Detector; 3.3 Similarity Detector; 3.4 Workload-Aware Dynamic Scheduler; 4 Evaluation; 4.1 Experimental Setup; 4.2 Result Analysis; 5 Conclusion; Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL; 3 Synthetic Benchmarking of Communication Approaches; 3.1 ACCL Communication Approaches; 3.2 Evaluation Infrastructure; 3.3 Resource Utilization of the Network Stack; 3.4 Modelling and Measurement of Throughput And Latency; 4 Acceleration of Shallow Water Simulation Using ACCL; 4.1 Implementation; 4.2 Performance Model; 4.3 Evaluation; A Folded Computation-in-Memory Accelerator for Fast Polynomial Multiplication in BIKE; 3.1 Hardware Structure of Accelerator; 3.2 Storage Mapping Strategy; 3.3 Data Flow; 3.4 Calculation Process; 4 Experiments and Results; 4.1 Implementation and Simulation; 4.2 Validity Verification of the Architecture; 4.3 Compared with Previous Work; (re)Assessing PiM Effectiveness for Sequence Alignment; 2 Background.; 3 Objectives and Methodology; 4 Experimental Setup and Evaluation Results; 4.1 Performance Comparison; 4.2 Power Comparison; 4.3 Roofline Model; 4.4 Normalized Comparison; 5 Related Work; MEPAD: A Memory-Efficient Parallelized Direct Convolution Algorithm for Deep Neural Networks; 2 Reference Architecture; 3 Mapping of Convolutional Approaches; 3.1 Matrix-Matrix Multiplication; 3.2 Description of State-of-the-Art Convolutional Approaches; 4 MEPAD; 5 Experimental Results; A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory; 2.1 Two-Phase Collective I/Os; 2.2 Persistent Memory; 2.3 Empirical Study; 3 Related Work; 4 Design of PMIO; 4.1 PMIO Buffers; 4.2 Basic Operations; 4.3 Log Merging; 4.4 Failure Recovery; 5.1 Experimental Setup; 5.2 Overall Results; 5.3 Impact of Log Merging; 5.4 Failure Recovery; 5.5 Performance with Intel Persistent Memory; PCTC: Hardware and Software Co-design for Pruned Capsule Networks on Tensor Cores; 2.1 Overview of CapsNets; 2.2 TC Architecture; 3 Capsule Tensor Core (CTC); 4 Pruned Capsule Tensor Core (PCTC); Harnessing Data Movement Strategies to Optimize Performance-Energy Efficiency of Oil &amp; Gas Simulations in HPC; 2.1 Data Movement Between CPU and GPU; 2.2 Fletcher Modeling; 3 Optimizing Data Transfer on Fletcher Modeling; 3.1 Original Version; 3.2 Asynchronous Communication; 3.3 Unified Memory Version; 3.4 Unified Memory with Prefetching Asynchronous Version; 4 Evaluation.; 4.1 Execution Environment; 4.2 Energy-Delay Product Evaluation; 4.3 Impact of Data Movement Strategies on the Performance and Energy Consumption; 4.4 Optimized Used of Data Movement Strategies; 6 Conclusions and Future Work; Compact Parallel Hash Tables on the GPU; 2.1 Hash Tables; 2.2 Cuckoo Hashing; 2.3 Iceberg Hashing; 2.4 Compact Hashing; 3 A Parallel Compact Iceberg Hash Table; 4 Implementation; 4.1 Architectural Considerations; 4.2 Iceberg Find-or-Put; 4.3 Permutations; 4.4 Parallel Compact Cuckoo Implementation; 4.5 CUDA Code; 5 Experimental Evaluation; 5.1 Synthetic Benchmarks; 5.2 Results; 5.3 Experiments with Real-World Data; Hybrid Congestion Control for BXI-Based Interconnection Networks; 1 Motivation; 2.1 Interconnection Networks; 2.2 Congestion Control Based on Injection Throttling; 2.3 Static Queuing Schemes to Reduce HoL Blocking; 3 Hybrid Congestion Control in BXI; 3.1 Assumed Switch Architecture; 3.2 Flow Control; 3.3 DCQCN in BXI Networks; 3.4 Discussion on Parameters Tuning; 4.1 Experiments Setup; 4.2 Results Analysis; 5 Conclusions; Data Analytics, AI and Computational Science; Athena: Add More Intelligence to RMT-Based Network Data Plane with Low-Bit Quantization; 2.1 RMT Pipeline Architecture; 2.2 Quantization and Pruning; 3 Athena Design; 3.1 Overview of Athena; 3.2 Dataflow Mapping of Athena Compiler; 3.3 The Reason for Choosing Column-Wise 2:4 Sparsity; 3.4 The Reason for Introducing Athena Filter Extension; 4 Experimental Results; 4.1 Accuracy of Sparse Low-Bit Model; 4.2 Inference Latency Overhead; 4.3 Filter Extension Overhead.; 5 Adapting Athena to Many-Core Architectures.
ISBN:: 3-031-69766-9

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

1 option

Euro-Par 2024: Parallel Processing : 30th European Conference on Parallel and Distributed Processing, Madrid, Spain, August 26–30, 2024, Proceedings, Part II / edited by Jesus Carretero, Sameer Shende, Javier Garcia-Blas, Ivona Brandic, Katzalin Olcoz, Martin Schreiber.

My Account

Guides