1 option
Euro-Par 2024: Parallel Processing : 30th European Conference on Parallel and Distributed Processing, Madrid, Spain, August 26–30, 2024, Proceedings, Part I / edited by Jesus Carretero, Sameer Shende, Javier Garcia-Blas, Ivona Brandic, Katzalin Olcoz, Martin Schreiber.
SpringerLink Books Lecture Notes In Computer Science (LNCS) (1997-2024) Available online
View online- Format:
- Book
- Author/Creator:
- Carretero, Jesus.
- Series:
- Lecture Notes in Computer Science, 1611-3349 ; 14801
- Language:
- English
- Subjects (All):
- Computer input-output equipment.
- Microprogramming.
- Microprocessors.
- Computer architecture.
- Computer networks.
- Computers, Special purpose.
- Input/Output and Data Communications.
- Control Structures and Microprogramming.
- Processor Architectures.
- Computer Communication Networks.
- Special Purpose and Application-Based Systems.
- Local Subjects:
- Input/Output and Data Communications.
- Control Structures and Microprogramming.
- Processor Architectures.
- Computer Communication Networks.
- Special Purpose and Application-Based Systems.
- Physical Description:
- 1 online resource (430 pages)
- Edition:
- 1st ed. 2024.
- Place of Publication:
- Cham : Springer Nature Switzerland : Imprint: Springer, 2024.
- Summary:
- The three-volume set LNCS 14801, 14802, and 14803 constitutes the proceedings of the 30th European Conference on Parallel and Distributed Processing, Euro-Par 2024, which took place in Madrid, Spain, during August 26–30, 2024. The 88 full papers included in the proceedings were carefully reviewed and selected from 293 submissions. They were organized in topical sections as follows: Part I: Programming, compilers, and performance; scheduling, resource management, cloud, edge computing, and workflows; Part II: Architectures and accelerators; data analytics, AI and computational science; Part III: Theory and algorithms; multidisciplinary, domain-specific and applied parallel and distributed computing.
- Contents:
- Intro
- Preface
- Organization
- Contents - Part I
- Contents - Part II
- Contents - Part III
- Programming, Compilers and Performance
- FlexiGran: Flexible Granularity Locking in Hierarchies
- 1 Introduction
- 2 Background and Motivation
- 2.1 DomLock and HiFi
- 3 Flexible Granularity Locking
- 3.1 Checking for Overlaps
- 3.2 FlexiGran with Dynamic Hierarchies
- 3.3 Space-Time Trade-offs with Bitsets
- 4 Experimental Evaluation
- 4.1 Varying Fine-Grained Operation Percentage
- 4.2 Varying the Number of Threads
- 4.3 Varying the Distribution of Locked Nodes and Hierarchy Size
- 4.4 Structural Modification Operations
- 4.5 False Subsumptions in FlexiGran
- 5 Related Work
- 6 Conclusion
- References
- Efficient Code Region Characterization Through Automatic Performance Counters Reduction Using Machine Learning Techniques
- 2 Motivation
- 3 Supervised Machine Learning Algorithms
- 4 Performance Counters Reduction Using ML Ensembles
- 5 Evaluation
- 5.1 Comparison with PCA and Correlation-Based Methodology
- 5.2 Comprehensive Dataset of OpenMP Regions
- 5.3 Applying the Methodology for Characterizing GPU Kernels
- 6 Related Work
- 7 Conclusions and Future Work
- ESIMD GPU Implementations of Deep Learning Sparse Matrix Kernels
- 2 Overview of the Algorithms
- 2.1 Parallelizing SDDMM
- 2.2 Parallelizing SPMM
- 2.3 Parallelizing FusedMM
- 3 ESIMD Implementation
- 4 Experiments
- 4.1 Randomly Generated Sparse Matrices
- 4.2 ResNet-50 Data Set
- 5 Conclusion and Future Work
- Deconstructing HPL-MxP Benchmark: A Numerical Perspective
- 2 Related Works
- 3 HPL-MxP Description and Implementation Notes
- 4.1 Exploring Low Precision
- 4.2 Improving the Input Data
- 5 Discussion and Perspective
- 6 Conclusion.
- References
- ImageMap: Enabling Efficient Mapping from Image Processing DSL to CGRA
- 2 Background
- 2.1 Halide
- 2.2 CGRA Mapping
- 3 Methods
- 3.1 Multi-level Partitioning and Halide Primitives Extensions
- 3.2 Auto-scheduling Algorithm
- 3.3 Performance Profiling
- 3.4 ImageMap Framework and Compilation Optimizations
- 4 Evaluations
- 4.1 Experiment Setup
- 4.2 Overall Performance Evaluation
- 4.3 Detailed Evaluations
- 5 Conclusion
- Predicting GPU Kernel's Performance on Upcoming Architectures
- 2 Roofline Model of a GPU with Kernel-Specific Ceilings
- 3 Projecting the Roofline Model with Ceilings to a Target GPU
- 4 Implementation
- 4.1 Collecting Metrics
- 4.2 Estimating the Target Operational Intensity
- 5 Experiments
- 5.1 GPU Test-Bed Description
- 5.2 Hydro1D
- 5.3 UVMBench
- 5.4 Quicksilver
- 5.5 LULESH
- 5.6 MiniMDock
- 7 Conclusion
- Bringing Auto-Tuning to HIP: Analysis of Tuning Impact and Difficulty on AMD and Nvidia GPUs
- 2 Related Work
- 3 Design and Implementation
- 4 Evaluation Metrics
- 5 Experimental Setup
- 6 Evaluation
- 6.1 Convolution
- 6.2 Hotspot
- 6.3 Dedispersion
- 6.4 GEMM (General Matrix Multiplication)
- 6.5 Performance Portability
- 7 Discussion
- 8 Conclusions
- A Mechanism to Generate Interception Based Tools for HPC Libraries
- 2 Gaps in Current Library Introspection Methods
- 2.1 Automatic Generation of Library Function Wrappers
- 2.2 Parameter Tracking
- 2.3 Tool Chaining
- 2.4 Analysis of Different Library Calls by Collaborative Tools
- 3 Tool and Interface Architecture Design
- 3.1 Generated Tools
- 3.2 Generated Tools Interface
- 3.3 Generation Infrastructure
- 4.1 Generated Tools Interface.
- 4.2 Generated Tools Implementation
- 4.3 Generator Implementation
- 5 Use Case Analysis
- 5.1 Automatic Generation of Library Function Wrappers
- 5.2 Parameter Tracking
- 5.3 Tool Chaining
- 5.4 Collaborative Tool Functionality
- 6 Overhead Analysis
- 6.1 Generated Interface Overhead
- 6.2 Toolchain Scaling
- 7 Related Work
- 8 Conclusion
- OMPGPT: A Generative Pre-trained Transformer Model for OpenMP
- 2.1 Generative Pre-trained Transformers and Code LLMs
- 2.2 LLMs on Code-Related Tasks for HPC
- 2.3 Prompt Engineering
- 3 Approach
- 3.1 OMPGPT Design
- 3.2 OMPGPT Training &
- Inference
- 3.3 Chain-of-OMP
- 3.4 Fine-Tuning
- 4 Evaluation
- 4.1 Model Perplexity
- 4.2 OpenMP Pragma Generation with OMPGPT Base Model
- 4.3 OpenMP Pragma Generation Using Chain-of-OMP
- 4.4 Fine-Tuning OMPGPT for Specific Pragma Generation
- 6 Conclusion and Future Work
- Scheduling, Resource Management, Cloud, Edge Computing, and Workflows
- Scheduling Distributed I/O Resources in HPC Systems
- 1 Introduction and Related Work
- 2 Model
- 2.1 Platform Model
- 2.2 Application Model and I/O Behavior
- 2.3 Measuring Performance
- 3 Algorithms for Scheduling I/O resources
- 3.1 Allocation Algorithms
- 3.2 Placement Algorithms
- 3.3 On the Difficulty of Instantiating an Algorithm
- 4.1 Evaluation Methodology
- 4.2 Results
- Light-Weight Prediction for Improving Energy Consumption in HPC Platforms
- 3 Preliminary Concepts
- 4 Predicting the Power Consumption of HPC Jobs
- 4.1 The First Method: Predicting Power Consumption with Users' Jobs History.
- 4.2 The Second Method: Predicting Using Supervised Regression
- 5 Scheduling with Jobs' Power Profile Prediction.
- 6 Results
- 6.1 Jobs' Power Prediction: Dataset Description
- 6.2 Jobs' Power Prediction: Experimental Setting
- 6.3 How to Predict Jobs' Power Information before the Jobs' Execution? How much prediction accuracy can we achieve?
- 6.4 Jobs Scheduling with Power Prediction: Experimental Setup
- 6.5 Which Jobs' Power Information Contribute to Better Scheduling Under Power Constraints?
- EKRM: Efficient Key-Value Retrieval Method to Reduce Data Lookup Overhead for Redis
- 3 Motivation
- 4 EKRM Design
- 4.1 Design Overview
- 4.2 EKRM Operations
- 4.3 Other Supports
- 5.1 System Configurations
- 5.2 Benchmarks and Datasets
- 5.3 Speedup on Redis
- 5.4 Case Studies
- Automated Data Management and Learning-Based Scheduling for Ray-Based Hybrid HPC-Cloud Systems
- 3.1 Ray Cluster Background
- 3.2 Overall System Design
- 3.3 Data Movement and Dynamic Labeling
- 3.4 Scheduling System and Dynamic Profiling
- 3.5 The Overall Workflow
- 4 Use Cases and Evaluation
- 4.1 Machine Learning Model Training
- 4.2 Image Processing
- 4.3 Further Performance Analysis
- 5 Conclusions
- Solving the Restricted Assignment Problem to Schedule Multi-get Requests in Key-Value Stores
- 2 Applicative Context and Formal Model
- 3 An Algorithm for the Restricted Assignment Problem on Regular Intervals
- 4 A General Framework for Circular Intervals
- 5 An Approximation for the Restricted Assignment Problem on Circular Intervals
- 6 Experimental Evaluation
- PriCE: Privacy-Preserving and Cost-Effective Scheduling for Parallelizing the Large Medical Image Processing Workflow over Hybrid Clouds.
- 1 Introduction
- 3 Problem Formulation
- 3.1 System Model
- 3.2 Privacy-Preserving and Cost-Effective Metrics
- 3.3 Multi-objective Optimization
- 4 PriCE: Privacy-Preserving and Cost-Effective Solution
- 4.1 Privacy-Preserving Image Splitting with Graph-Coloring
- 4.2 Pareto Trade-Off Solution Among Privacy, Cost, and Time
- 5 Experiments and Evaluation
- 5.1 Experimental Setup
- 5.2 Visualization
- 5.3 Reduction of the Average Lower Bound on Privacy Risk
- 5.4 Evaluation by Simulations
- A 1.25(1+)-Approximation Algorithm for Scheduling with Rejection Costs Proportional to Processing Times
- 4 Scheduling with a Bound on Makespan
- 4.1 Job Types
- 4.2 Algorithm
- 4.3 Proof
- 5 BEKP Approximation Algorithm
- 5.1 Computing Bounds on the Optimal Makespan
- 5.2 BEKP Algorithm
- 5.3 Complexity
- 6 Experiments
- 7 Conclusion and Perspectives
- DProbe: Profiling and Predicting Multi-tenant Deep Learning Workloads for GPU Resource Scaling
- 2 System Design
- 2.1 Job Profiler of DProbe
- 2.2 Demand Predictor of DProbe
- 3 Evaluation
- 3.1 Experimental Setup
- 3.2 Evaluation of Performance Metric Prediction
- 3.3 Evaluation of Resource Demand Prediction
- 3.4 Evaluation of Resource Scaling Optimization
- 4 Related Work
- sAirflow: Adopting Serverless in a Legacy Workflow Scheduler
- 3 System Context: FaaS and Airflow
- 4 sAirflow: Design and Implementation
- 4.1 Control Flow
- 4.2 Change Data Capture (CDC)
- 4.3 Scheduler
- 4.4 Executors and Workers
- 5 Deployment and Evaluation Method
- 6 Experimental Evaluation Results
- 6.1 Function Executor and Cold Starts.
- 6.2 Function Executor and Warm Starts.
- ISBN:
- 3-031-69577-1
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.