My Account Log in

2 options

Prague Czech-English dependency treebank 1.0.

LIBRA -
Loading location information...

Available from offsite location This item is stored in our repository but can be checked out.

Log in to request item
LIBRA -
Loading location information...

Available from offsite location This item is stored in our repository but can be checked out.

Log in to request item
Format:
Datafile
Contributor:
Curin, Jan.
Linguistic Data Consortium.
Language:
Czech
English
Subjects (All):
Czech language--Databases.
Czech language.
English language--Databases.
English language.
Machine translating.
Genre:
Databases.
Dictionaries.
Physical Description:
1 CD-ROM ; 4 3/4 in.
4 3/4 in.
Place of Publication:
[Philadelphia, PA] : Linguistic Data Consortium, 2004.
System Details:
digital
optical
data file
Summary:
"The core part of PCEDT 1.0 is a Czech translation of 21,600 English sentences from the Wall Street Journal, which are part of the Penn Treebank corpus. Sentences of the Czech translation were automatically morphologically annotated and parsed into two levels (analytical and tectogrammatical) of dependency structures introduced in the theory of Functional Generative Description and closely related to the Prague Dependency Treebank project. The original English sentences were transformed from the Penn Treebank phrase-structure trees into dependency representations. A heldout (development and evaluation) set of 515 sentence pairs was selected and manually annotated on tectogrammatical level in both Czech and English; for the purposes of quantitative evaluation, this set has been retranslated from Czech into English by 4 different translation companies. PCEDT 1.0 also contains a parallel Czech-English corpus of plain text from Reader's Digest 1993-1996 consisting of 53,000 parallel sentences, and a large monolingual corpus of Czech (2.4 M sentences). The included Czech-English translation dictionary consists of 46,150 translation pairs in its lemmatized version and 496,673 pairs of word forms, where for each entry-translation pair all corresponding word form pairs have been generated. Also included is an English-Czech dictionary provided by Milan Svoboda under GNU/FDL license; this dictionary contains multi-word translations in 115,929 translation pairs." - catalogue.
Notes:
Title from disc label.
"LDC2004T25."
Data type: Text.
Data sources: Dictionaries, newswire.
Authors: Jan Curin and others.
"developed at the Center for Computational Linguistics in cooperation with the Institute of Formal and Applied Linguistics." - catalogue.
ISBN:
1585633216
9781585633210
OCLC:
243508055

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account