2 options

Fine-grained provenance and applications to data analytics computation / Nan Zheng.

Online

Available online

Dissertations & Theses @ University of Pennsylvania Available online

Format:: Book; Thesis/Dissertation
Author/Creator:: Zheng, Nan, author.
Contributor:: Ives, Zachary G., degree supervisor.; University of Pennsylvania. Department of Computer and Information Science, degree granting institution.
Language:: English
Subjects (All):: Computer science.; Computer engineering.; Information science.; Computer and Information Science--Penn dissertations.; Penn dissertations--Computer and Information Science.
Local Subjects:: Computer science.; Computer engineering.; Information science.; Computer and Information Science--Penn dissertations.; Penn dissertations--Computer and Information Science.
Genre:: Academic theses.
Physical Description:: 1 online resource (141 pages)
Contained In:: Dissertations Abstracts International 82-12B.
Place of Publication:: [Philadelphia, Pennsylvania] : University of Pennsylvania ; Ann Arbor : ProQuest Dissertations & Theses, 2021.
Language Note:: English
System Details:: Mode of access: World Wide Web.; text file
Summary:: Data provenance tools seek to facilitate reproducible data science and auditable data analyses by capturing the analytics steps used in generating data analysis results. However, analysts must choose among workflow provenance systems, which allow arbitrary code but only track provenance at the granularity of files; provenance APIs, which provide tuple-level provenance, but incur overhead in all computations; and database provenance tools, which track tuple-level provenance through relational operators and support optimization, but support a limited subset of data science tasks. None of these solutions are well suited for tracing errors introduced during common ETL, record alignment, and matching tasks-for data types such as strings, images, etc. Additionally, we need a provenance archival layer to store and manage the tracked fine-grained provenance that enables future sophisticated reasoning about why individual output results appear or fail to appear. For reproducibility and auditing, the provenance archival system should be tamper-resistant. On the other hand, the provenance collecting over time or within the same query computation tends to be repeated partially (i.e., the same operation with the same input records in the middle computation step). Hence, we desire efficient provenance storage (i.e., it compresses repeated results). We address these challenges with novel formalisms and algorithms, implemented in the PROVision system, for reconstructing fine-grained provenance for a broad class of ETL-style workflows. We extend database-style provenance techniques to capture equivalences, support optimizations, and enable lazy evaluations. We develop solutions for storing fine-grained provenance in relational storage systems while both compressing and protecting it via cryptographic hashes. We experimentally validate our proposed solutions using both scientific and OLAP workloads.
Notes:: Source: Dissertations Abstracts International, Volume: 82-12, Section: B.; Advisors: Ives, Zachary G.; Committee members: Susan Davidson; Boon Thau Loo; Andreas Haeberlen; Junhyong Kim.; Department: Computer and Information Science.; Ph.D. University of Pennsylvania 2021.
Local Notes:: School code: 0175
ISBN:: 9798738617621
Access Restriction:: Restricted for use by site license.; This item must not be sold to any third party vendors.

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

2 options

Fine-grained provenance and applications to data analytics computation / Nan Zheng.

Find

My Account

Guides