My Account Log in

2 options

Empirical methods for exploiting parallel texts.

Online

Available online

View online

Dissertations & Theses @ University of Pennsylvania Available online

View online
Format:
Book
Thesis/Dissertation
Author/Creator:
Melamed, Ilya Dan.
Contributor:
Marcus, Mitchell, advisor.
University of Pennsylvania.
Language:
English
Subjects (All):
Computer science.
0984.
Penn dissertations--Computer and information science.
Computer and information science--Penn dissertations.
Local Subjects:
Penn dissertations--Computer and information science.
Computer and information science--Penn dissertations.
0984.
Physical Description:
204 pages
Contained In:
Dissertation Abstracts International 59-04B.
System Details:
Mode of access: World Wide Web.
text file
Summary:
The translation of a text can be viewed as a detailed annotation of the text's meaning. From this point of view, texts that exist in two languages (bitexts) are the richest accessible source of linguistic knowledge. Such knowledge can be exploited in many ways, if it can be automatically acquired. The acquisition process is invariably based on automatic methods for inducing translational equivalence relations between the two halves of a bitext. At the word token level, these relations are called bitext maps; at the word type level, they are called translation models. This dissertation advances the state of the art in methods for determining both kinds of translational equivalence. It also shows how to integrate these methods to exploit a much wider variety of bitexts than was previously possible.
The dissertation begins by showing that the language-specific aspects of the bitext mapping problem can be encapsulated and modularized away, leaving only a problem of geometric pattern recognition. The best solution is then the one that maximizes the signal-to-noise ratio in the search space and employs the fastest and most accurate search algorithm. The dissertation presents new methods for maximizing the signal strength, for filtering noise, and for searching the resulting scatterplot in linear expected space and time. The unprecedented accuracy of this solution enables a new application of bitext maps--automatic detection of omissions in translations.
The second half of the dissertation makes a number of advances in statistical translation modeling. First, it proves the feasibility of modeling translational equivalence independently of word order. Second, the dissertation shows why and how translation models can benefit from an explicit noise model. Third, it shows how the noise model can be conditioned on almost any kind of pre-existing language-specific knowledge, and that even simple linguistic clues can significantly improve translation model accuracy. Fourth, the dissertation shows how to automatically determine the sense inventories of words in bitext and how to automatically discover word sequences that are translated as a unit. This information enables translation models that account for polysemy and for phrasal translations.
Notes:
Thesis (Ph.D. in Computer and Information Science) -- University of Pennsylvania, 1998.
Source: Dissertation Abstracts International, Volume: 59-04, Section: B, page: 1740.
Supervisor: Mitchell Marcus.
Local Notes:
School code: 0175.
ISBN:
9780591827996
Access Restriction:
Restricted for use by site license.

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account