My Account Log in

2 options

On an incomplete data problem in modeling: Evidence from Web usage mining and a general purpose solution.

Online

Available online

View online

Dissertations & Theses @ University of Pennsylvania Available online

View online
Format:
Book
Thesis/Dissertation
Author/Creator:
Zheng, Zhiqiang.
Contributor:
Padmanabhan, Balaji, advisor.
University of Pennsylvania.
Language:
English
Subjects (All):
Information science.
Business.
0310.
0723.
Penn dissertations--Operations and information management.
Operations and information management--Penn dissertations.
Penn dissertations--Managerial science and applied economics.
Managerial science and applied economics--Penn dissertations.
Local Subjects:
Penn dissertations--Operations and information management.
Operations and information management--Penn dissertations.
Penn dissertations--Managerial science and applied economics.
Managerial science and applied economics--Penn dissertations.
0310.
0723.
Physical Description:
122 pages
Contained In:
Dissertation Abstracts International 64-04A.
System Details:
Mode of access: World Wide Web.
text file
Summary:
In business domains, firms often only have incomplete information on their customers. Acquiring complete information for all customers can prove prohibitively expensive. This dissertation shows how selective information acquisition can reduce the amount of information to supplant incomplete customer information.
One example of incomplete customer information stems from the web usage domain. As revealed in this thesis, the data collected locally by a single firm on its customers' accesses to its web site (site-centric data) is inherently incomplete, because it does not capture user behavior across sites. While most users search multiple sites in a session, site-centric data only captures a tree in the forest. By only looking into a tree, can a site be able to accurately capture consumer online behavior and subsequently build correct customer models? The first half of this thesis investigates this problem and empirically demonstrates that incomplete data not only hurts model performance, but more importantly, can lead to erroneous managerial decisions.
The naive solution to the above incomplete data problem---acquiring the complete data for all customers, is often impractical due to cost. A natural alternative is to acquire complete data for some customers and to use this to improve the models built. We define selective data acquisition as the task of determining how many, and which, customers from whom we might acquire additional data. Our solution to the problem employs a utility function to discern the value of a specific customer's data to the model. In the second half of this thesis we develop two specific utility functions for logistic regressions and decision trees respectively. We empirically test the methods on web usage data provided by Jupiter Media Metrix and common UCI datasets. The results show that the methods perform well and indicate that selective data acquisition is a promising area for research.
Notes:
Thesis (Ph.D. in Operations and Information Management) -- University of Pennsylvania, 2003.
Source: Dissertation Abstracts International, Volume: 64-04, Section: A, page: 1323.
Adviser: Balaji Padmanabhan.
Local Notes:
School code: 0175.
ISBN:
9780496352616
Access Restriction:
Restricted for use by site license.

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account