My Account Log in

2 options

MDE RT-04 training data text/annotations / [Authors, Christopher Walker, Stephanie Strassel, Elizabeth Shriberg... and others].

LIBRA -
Loading location information...

Available from offsite location This item is stored in our repository but can be checked out.

Log in to request item
LIBRA -
Loading location information...

Available from offsite location This item is stored in our repository but can be checked out.

Log in to request item
Format:
Datafile
Contributor:
Walker, Christopher.
Strassel, Stephanie.
Shriberg, Elizabeth.
Linguistic Data Consortium.
Language:
English
Subjects (All):
Computational linguistics--Databases.
Computational linguistics.
Metadatabases.
Genre:
Databases.
Physical Description:
1 CD-ROM ; 4 3/4 in.
4 3/4 in.
Other Title:
Metadata extraction RT-04 training data text and annotations
Place of Publication:
[Philadelphia, PA] : Linguistic Data Consortium, [2005]
System Details:
data file
Summary:
"This corpus was created by Linguistic Data Consortium to provide training data for the RT-04 Fall Metadata Extraction (MDE) Evaluation, part of the DARPA EARS (Efficient, Affordable, Reusable Speech-to-Text) Program. This data set has been created and distributed by Linguistic Data Consortium. This data was previously released to the EARS MDE community as LDC2004E31.The goal of MDE is to enable technology that can take raw Speech-to-Text output and refine it into forms that are of more use to humans and to downstream automatic processes. In simple terms, this means the creation of automatic transcripts that are maximally readable. This readability might be achieved in a number of ways: flagging non-content words like filled pauses and discourse markers for optional removal; marking sections of disfluent speech; and creating boundaries between natural breakpoints in the flow of speech so that each sentence or other meaningful unit of speech might be presented on a separate line within the resulting transcript. Natural capitalization, punctuation and standardized spelling, plus sensible conventions for representing speaker turns and identity are further elements in the readable transcript. LDC has defined a SimpleMDE annotation task specification and has annotated English telephone and broadcast news data to provide training data for MDE.In this release, some original annotations contained in LDC2004E31 have been re-mapped to new MDE elements to support better annotation consistency. In particular, the mapping affects Discourse Responses (DR), Discourse Markers (DM) and Backchannel SUs (BC). A description of the original mapping proposed by ICSI appears in 3) below, with complete documentation of the mapping rules contained in the docs/drmap-discussion directory. The scripts used to apply the mapping can be found in the docs/scripts/drmap directory. "--index.html
Notes:
Title from index.html on CD.
"LDC2005T24."
ISBN:
1585633585
9781585633586
OCLC:
63518004

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account