My Account Log in

3 options

Weakly supervised learning from multiple modalities: exploiting video, audio and text for video understanding / Timothee Cour.

LIBRA Microfilm P38:2009
Loading location information...

Available from offsite location This item is stored in our repository but can be checked out.

Log in to request item
LIBRA Diss. POPM2009.156
Loading location information...

Available from offsite location This item is stored in our repository but can be checked out.

Log in to request item
LIBRA QA003 2009 .C858
Loading location information...

Available from offsite location This item is stored in our repository but can be checked out.

Log in to request item
Format:
Book
Manuscript
Microformat
Thesis/Dissertation
Author/Creator:
Cour, Timothee.
Contributor:
Alur, Rajeev, 1966- advisor.
University of Pennsylvania.
Language:
English
Subjects (All):
Penn dissertations--Computer and information science.
Computer and information science--Penn dissertations.
Local Subjects:
Penn dissertations--Computer and information science.
Computer and information science--Penn dissertations.
Physical Description:
xvi, 174 pages : illustrations ; 29 cm
Production:
2009.
Summary:
As web and personal content become ever more enriched by videos, there is increasing need for semantic video search and indexing. A main challenge for this task is lack of supervised data for learning models. In this dissertation we propose weakly supervised algorithms for video content analysis, focusing on recovering video structure, retrieving actions and identifying people. Key components of the algorithms we present are (1) alignment between multiple modalities: video, audio and text, and (2) unified convex formulation for learning under weak supervision from easily accessible data.
At a coarse level, we focus on the task of recovering scene structure in movies and TV series. We present a weakly supervised algorithm that parses a movie into a hierarchy of scenes, threads and shots. Movie scene boundaries are aligned with screenplay scenes and shots are reordered into threads. We present a unified generative model and novel hierarchical dynamic program inference.
At a finer level, we aim at resolving person identity in video using images, screenplay and closed captions. We consider a partially-supervised multiclass classification setting where each instance is labeled ambiguously with more than one label. The set of potential labels for each face is the characters' names mentioned in the corresponding screenplay scene. We propose a novel convex formulation based on minimization of a surrogate loss. We show theoretical analysis and strong empirical proof that effective learning is possible even when all examples are ambiguously labeled.
We also investigate the challenging scenario of naming people in video without screen-play. Our only source of (indirect) supervision are person references mentioned in dialog, such as "Hey, Jack!". We resolve identities by learning a classifier from partial label constraints, incorporating multiple-instance constraints from dialog, gender and local grouping constraints, in a unified convex learning formulation. Grouping constraints are provided by a novel temporal grouping model that integrates appearance, synchrony and film-editing cues to partition faces across multiple shots. We present dynamic programming inference and discriminative learning for this partitioning model.
We have deployed our framework on hundreds of hours of movies and TV, and present quantitative and qualitative results for each component.
Notes:
Adviser: Rajeev Alur.
Thesis (Ph.D. in Computer and Information Science) -- University of Pennsylvania, 2009.
Includes bibliographical references.
Local Notes:
University Microfilms order no.: 3363273.

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account