My Account Log in

2 options

Content selection in multi-document summarization / Hong, Kai.

Online

Available online

View online

Dissertations & Theses @ University of Pennsylvania Available online

View online
Format:
Book
Thesis/Dissertation
Author/Creator:
Hong, Ka-i, author.
Contributor:
Marcus, Mitchell P., degree supervisor.
Nenkova, Ani, degree supervisor.
Ungar, Lyle, degree committee member.
Liberman, Mark, degree committee member.
Kannan, Sampath, degree committee member.
Conroy, John M., degree committee member.
University of Pennsylvania. Computer and Information Science, degree granting institution.
Language:
English
Subjects (All):
Computer science.
Computer and Information Science--Penn dissertations.
Penn dissertations--Computer and Information Science.
Local Subjects:
Computer science.
Computer and Information Science--Penn dissertations.
Penn dissertations--Computer and Information Science.
Genre:
Academic theses.
Physical Description:
1 online resource (254 pages)
Contained In:
Dissertation Abstracts International 77-06B(E).
Place of Publication:
[Philadelphia, Pennsylvania]: University of Pennsylvania ; Ann Arbor : ProQuest Dissertations & Theses, 2015.
Language Note:
English
System Details:
Mode of access: World Wide Web.
text file
Summary:
Automatic summarization has advanced greatly in the past few decades. However, there remains a huge gap between the content quality of human and machine summaries. There is also a large disparity between the performance of current systems and that of the best possible automatic systems. In this thesis, we explore how the content quality of machine summaries can be improved. First, we introduce a supervised model to predict the importance of words in the input sets, based on a rich set of features. Our model is superior to prior methods in identifying words used in human summaries (i.e., summary keywords). We show that a modular extractive summarizer using the estimates of word importance can generate summaries comparable to the state-of-the-art systems. Among the features we propose, we highlight global knowledge, which estimate word importance based on information independent of the input. In particular, we explore two kinds of global knowledge: (1) important categories mined from dictionaries, and (2) intrinsic importance of words. We show that global knowledge is very useful in identifying summary keywords that have low frequency in the input. Second, we present a new framework of system combination for multi-document summarization. This is motivated by our observation that different systems generate very different summaries. For each input set, we generate candidate summaries by combining whole sentences produced by different systems. We show that the oracle summary among these candidates is much better than the output from the systems that we have combined. We then introduce a support vector regression model to select among these candidates. The features we employ in this model capture the informativeness of a summary based on the input documents, the outputs of different systems, and global knowledge. Our model achieves considerable improvement over the systems that we have combined while generating summaries up to a certain length. Furthermore, we study what factors could affect the success of system combination. Experiments show that it is important for the systems combined to have a similar performance.
Notes:
Source: Dissertation Abstracts International, Volume: 77-06(E), Section: B.
Advisors: Ani Nenkova; Mitchell P. Marcus; Committee members: John M. Conroy; Sampath Kannan; Mark Liberman; Lyle Ungar.
Department: Computer and Information Science.
Ph.D. University of Pennsylvania 2015.
Local Notes:
School code: 0175
ISBN:
9781339427836
Access Restriction:
Restricted for use by site license.

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account