My Account Log in

1 option

Efficiently tracking provenance in scientific workflows / Zhuowei Bao.

LIBRA QA003 2012 .B221
Loading location information...

Available from offsite location This item is stored in our repository but can be checked out.

Log in to request item
Format:
Book
Manuscript
Thesis/Dissertation
Author/Creator:
Bao, Zhuowei.
Contributor:
Davidson, Susan B., advisor.
Ives, Zachary G., committee member.
Khanna, Sanjeev, committee member.
Alur, Rajeev, 1966- committee member.
Milo, Tova, committee member.
University of Pennsylvania. Computer and Information Science.
Language:
English
Subjects (All):
Penn dissertations--Computer and information science.
Computer and information science--Penn dissertations.
Local Subjects:
Penn dissertations--Computer and information science.
Computer and information science--Penn dissertations.
Physical Description:
xiv, 188 pages : illustrations ; 29 cm
Production:
2012.
Summary:
Tracking the provenance of data produced by a workflow execution involves answering reachability queries over large provenance graphs, which can be expensive. For that, we present compact labeling schemes for efficiently answering reachability queries over provenance graphs that are derived from executions of a given workflow specification. The idea is to assign each node a reachability label such that using only the labels of any two nodes, we can quickly decide if one can reach the other. Our proposed schemes build logarithmic-size labels in linear time, and answer any query in constant time.
In this dissertation, we consider the reachability labeling problem for a variety of workflow settings. First, we study the static labeling problem, where the entire provenance graph is given as input. For workflows with well-nested loops and forks (i.e., parallel executions), we develop a skeleton-based labeling approach which uses the labeling for the specification as an effective skeleton for designing the labeling for its executions. Next, we turn to the dynamic labeling problem, where the input provenance graph grows over time but the nodes must be labeled on-the-fly. We first show that, in general, for workflows that contain arbitrary recursion, dynamic labeling of their executions requires long (linear-size) labels. Nevertheless, we identify a natural class of workflows with linear recursion, for which dynamic, yet compact (logarithmic-size) labeling is possible. Finally, we revisit the dynamic labeling problem when fine-grained dependencies between inputs and outputs of modules are defined over multiple workflow views. It turns out that the restriction of linear recursion, which suffices to reduce the label length before, is no longer helpful. However, for a more restricted class of workflows with strictly linear recursion and safe views, we propose a novel view-adaptive dynamic labeling approach.
Notes:
Adviser: Susan B. Davidson.
Thesis (Ph.D. in Computer and Information Science) -- University of Pennsylvania, 2012.
Includes bibliographical references.
OCLC:
828770095

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account