2 options
The AQUAINT corpus of English news text / sponsored by NIST.
- Format:
- Book
- Language:
- English
- Subjects (All):
- Newspapers--Language--Databases.
- Newspapers.
- News agencies--Language--Databases.
- News agencies.
- English language--Written English--Databases.
- English language.
- English language--Written English.
- Newspapers--Language.
- Genre:
- Databases.
- Physical Description:
- 1 CD-ROM : color ; 4 3/4 in.
- 4 3/4 in.
- monochrome
- Place of Publication:
- [Philadelphia, PA] : Linguistic Data Consortium, [2002]
- System Details:
- text file
- Summary:
- Consists of newswire text data in English, drawn from three sources: the Xinhua News Service (People's Republic of China), the New York Times News Service, and the Associated Press Worldstream News Service. Data was prepared by the LDC for the AQUAINT Project, and will be used in official benchmark evaluations conducted by National Institute of Standards and Technology (NIST). All data files contains a stream of SGML-tagged text data presenting a series of news stories and are in compressed form using the GNU "gzip" utility.
- Notes:
- Title from disc label.
- DCMI type(s): Text.
- Data source(s): Newswire.
- Application(s): Tagging, parsing, natural language processing.
- Author(s): David Graff.
- ISBN:
- 1585632406
- 9781585632404
- OCLC:
- 50761900
- Access Restriction:
- Restricted for use by site license.
- Online:
- LDC catalog entry
- Using LDC Data general informaation
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.