Keywords

Keyword analysis allows us to see which words occur more frequently in corpus than in another. In other words, it tells you what words are 'special' to a group of texts from one particular source, or about one particular topic. The tool works best if you have a large, general benchmark corpus that you can compare your smaller corpus against. It will then produce a list of the words that are most significant in your corpus, either because they appear far more, or less, frequently than would be expected based on their frequency in the benchmark corpus.

Keywords

Look at these twenty key words from a small corpus made up of data transcribed from paired speaking tests. The list was generated by comparing the smaller corpus with a much bigger corpus of general spoken language. This list gives you a good idea about the topic of the speaking test.

  • What do you think the topic is?
  • Why do you think the items at position 2 and position 17 are there?

Interpreting the Results:

The test involved participants looking at three photographs: a forest, a shopping mall and a funfair. They had to discuss which option should be used for the development of some wasteland near their town. These keywords are a good indicator of the topic.

The inclusion of 'um' and 'uh' is the list is a good example of one of the difficulties encountered when we make comparisons between corpora. For example, they could also be transcribed as 'erm' and 'ah'. It is also possible that these vocalizations have been transcribed identically, but that the speakers in our tests may have been more nervous and hesitant because they were in a 'test' situation. In this case, it is a good idea to examine concordance lines for these items from both corpora.

Keywords

Look at these ten key three-word clusters from a specialist spoken corpus. These phrases were generated when this corpus was compared with a corpus of general spoken language. Can you deduce anything about the nature of the specialist corpus from this list?

Interpreting the Results:

This specialist corpus is in fact a corpus of academic English. Most of these texts came from lectures and other classes, and these phrases represent some of the most typical ones to be encountered in those contexts. Many of them relate to exposition and explanation, while others show how spoken academic language is interactional as well.

NonDiscrimination Statement | Affirmative Action | Privacy Policy | Copyright Policy

© 2002-2012 CALPER and The Pennsylvania State University. All Rights Reserved.
   overview  |   background  |   applications  |   analysis  |   the classroom  |   materials  |   the future
The Pennsylvania State University CALPER South Asia Language Resource Center Center for Languages of the Central Asian Region National Capital Language Resource Center Center for Advanced Language Proficiency Education and Research National East Asian Languages Resource Center Center for Language Education and Research National African Language Resource Center National K-12 Foreign Language Resource Center Center for Advanced Research on Language Acquisition National Foreign Language Resource Center Center for Educational Resources in Culture, Language and Literacy Language Acquisition Resource Center National Heritage Language Resource Center National Middle East Language Resource Center Center for Applied Second Language Studies