My Account Log in

1 option

Learning to scale up search-driven data integration / Zhepeng Yan.

LIBRA QA003 2016 .Y21
Loading location information...

Available from offsite location This item is stored in our repository but can be checked out.

Log in to request item
Format:
Book
Manuscript
Thesis/Dissertation
Author/Creator:
Yan, Zhepeng, author.
Contributor:
Ives, Zachary G., degree supervisor.
Guha, Sudipto, degree committee member.
Tannen, Val, 1953- degree committee member.
Ungar, Lyle, degree committee member.
Yu, Cong, degree committee member.
University of Pennsylvania. Department of Computer and Information Science, degree granting institution.
Language:
English
Subjects (All):
Penn dissertations--Computer and Information Science.
Computer and Information Science--Penn dissertations.
Local Subjects:
Penn dissertations--Computer and Information Science.
Computer and Information Science--Penn dissertations.
Physical Description:
xii, 125 leaves : illustrations ; 29 cm
Production:
[Philadelphia, Pennsylvania] : University of Pennsylvania, 2016.
Summary:
A recent movement to tackle the long-standing data integration problem is a compositional and iterative approach, termed "pay-as-you-go" data integration. Under this model, the objective is to immediately support queries over "partly integrated" data, and to enable the user community to drive integration of the data that relate to their actual information needs. Over time, data will be gradually integrated.
While the pay-as-you-go vision has been well-articulated for some time, only recently have we begun to understand how it can be manifested into a system implementation. One branch of this effort has focused on enabling queries through keyword search-driven data integration, in which users pose queries over partly integrated data encoded as a graph, receive ranked answers generated from data and metadata that is linked at query-time, and provide feedback on those answers. From this user feedback, the system learns to repair bad schema matches or record links.
Many real world issues of uncertainty and diversity in search-driven integration remain open. Such tasks in search-driven integration require a combination of human guidance and machine learning. The challenge is how to make maximal use of limited human input. This thesis develops three methods to scale up search-driven integration, through learning from expert feedback: (1) active learning techniques to repair links from small amounts of user feedback; (2) collaborative learning techniques to combine users' conflicting feedback; and (3) debugging techniques to identify where data experts could best improve integration quality. We implement these methods within the Q System, a prototype of search-driven integration, and validate their effectiveness over real-world datasets.
Notes:
Ph. D. University of Pennsylvania 2016.
Department: Computer and Information Science.
Supervisor: Zachary G. Ives.
Includes bibliographical references.
OCLC:
982021977

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account