Chinese Corpus Resource Guide

Author(s): Hongyin Tao, UCLA
Publication Year(s): 2006
Size: 13 Pages
A corpus (plural: corpora) is a principled collection of samples of natural language use, either written or spoken, which are usually stored as computer files. A written corpus can be gathered from a number of sources such as news media, literary works, or personal writings. A spoken corpus can be assembled from tape- or video-recorded narratives, interviews, conversations and the like, which would be transcribed into written texts. The size of a corpus can range from tens of millions of words to a few thousand. Larger corpora are usually required for big research projects such as writing dictionaries and major grammars, but so-called “mini corpora” consisting of several thousands of words can be extremely useful for language teachers. Once a corpus is built, we can use software tools to analyze it and produce word frequency lists, concordances and other useful types of output.
More
Chinese Corpus Resource Guide Cover Image
Type: PDF

Publication Units:

Chinese Corpus Resource Guide_CALPER_2006