My Account Log in

1 option

Reproducibility and Performance Optimizations for Unmodified Linux Programs / Kelly R Shiptoski.

Dissertations & Theses @ University of Pennsylvania Available online

View online
Format:
Book
Thesis/Dissertation
Author/Creator:
Shiptoski, Kelly R., author.
Contributor:
University of Pennsylvania. Computer and Information Science, degree granting institution.
Language:
English
Subjects (All):
Computer science.
Computer engineering.
Electrical engineering.
Computer and Information Science--Penn dissertations.
Penn dissertations--Computer and Information Science.
Local Subjects:
Computer science.
Computer engineering.
Electrical engineering.
Computer and Information Science--Penn dissertations.
Penn dissertations--Computer and Information Science.
Physical Description:
1 online resource (132 pages)
Distribution:
Ann Arbor : ProQuest Dissertations & Theses, 2023
Contained In:
Dissertations Abstracts International 85-08B.
Place of Publication:
[Philadelphia, Pennsylvania] : University of Pennsylvania, 2022.
Language Note:
English
Summary:
The demands of modern computing continue to escalate each year. Consumers expect increased performance from systems which must also operate perfectly and never experience issues. These goals are difficult to achieve for any single system, let alone systems in general.System calls are the means by which applications interact with the OS (operating system). Program tracing that takes place at the system call level is an incredibly powerful tool, allowing us to abstract over many details of program execution and reduce programs to the system calls they perform. It allows us to work between the kernel and application levels, which is an easier level of abstraction to reason about, but still general enough that we can look at any program as a series of system calls.In this dissertation, we take a step beyond read-only tracing, and delve into program manipulation via system call interposition, its many applications, and how it allows us to produce program agnostic systems. We fully analyze ptrace, a built-in Linux tool for program tracing and manipulation, and describe its strengths and shortcomings. We also explain how we developed an asynchronous wrapper around the ptrace API which allowed us to circumvent both programmability and performance issues inherent to ptrace. We demonstrate ptrace's utility as a key component of two systems we created: DetTrace and ProcessCache.DetTrace is a reproducible container abstraction for Linux implemented entirely in userspace. All computation that occurs inside a DetTrace container is a pure function of the initial file system state of the container. Reproducible containers can be used for a variety of purposes, including replication for fault-tolerance, reproducible software builds, and reproducible data analytics. We use DetTrace to achieve, in an automatic fashion, reproducibility for 12,130 Debian package builds, containing over 800 million lines of code, as well as bioinformatics and machine learning workflows. We show that, while software in each of these domains is initially irreproducible, DetTrace brings reproducibility without requiring any hardware, OS, or application changes. DetTrace 's performance is dictated by the frequency of system calls: I/O-intensive software builds have an average overhead of 3.49x, while compute-bound bioinformatics workflows are under 2%.The ProcessCache system provides a generic facility for automatically memoizing the work of a broad class of multi-process Linux programs. ProcessCache caches results and transparently determines when cached results can be used and when re-execution is necessary. ProcessCache generalizes previous work on forward build systems, to go beyond software builds to other multiprocess programs like shell scripts and bioinformatics workflows. ProcessCache supports unmodified Linux binaries, using the ptrace mechanism to trace system calls and determine program inputs. Our experiments show that ProcessCache can automatically provide incremental computation to existing programs, accelerating workloads from 1.06x to 65x.We conclude with an in-depth analysis of future work directions for the two systems. We focus this section on ProcessCache because it comprises the bulk of this dissertation, but first we propose an addition to DetTrace that could alleviate performance and correctness issues it suffers when handling threads. For ProcessCache, we examine potential avenues to improve its performance, space utilization, and correctness guarantees, and also discuss why some previously proposed improvements are not viable solutions.
Notes:
Source: Dissertations Abstracts International, Volume: 85-08, Section: B.
Advisors: Devietti, Joseph; Committee members: Lee, Benjamin C.; Loo, Boon Thau; Liu, Vincent; Newton, Ryan.
Department: Computer and Information Science.
Ph.D. University of Pennsylvania 2023.
Local Notes:
School code: 0175
ISBN:
9798381509625
Access Restriction:
Restricted for use by site license.

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account