Corpora can give us valuable insights into what vocabulary items we should teach first, and which ones to leave until later. This, of course, has an important impact on both syllabus and materials design. Frequency counts of a wide range of corpora seem to show us that there is a core of extremely high frequency words. The existence of this 'core vocabulary' is seen in both written and spoken corpora, although of course the actual frequency of items does vary. These frequently-occurring words are extremely hardworking, and it would therefore seem logical that we should present them to our learners first.

Let us consider whether we can identify a basic vocabulary for spoken English, looking in particular at the frequency of words appearing in the 5 million-word CANCODE spoken corpus and the 10 million-word spoken segement of the British National Corpus. The first hundred words for both the corpora are remarkably similar, but that does not make things straightforward. As we have said before, the most frequently occurring items are what we would class 'grammatical' items (articles, pronouns, demonstratives, conjunctions etc.) Another interesting thing that we encounter is that there are quite clearly fixed phrases or 'chunks' that occur very frequently. For example 'know' and 'mean' are very frequent primarily because of their association with 'I' in the chunks 'I know' and 'I mean'. The frequency of these items would seem to indicate that they should be taught sooner, rather than later.

Vocabulary materials design

We saw in the section on core vocabulary that there are about 2,000 words which work the hardest in the (spoken) language. Materials writers such as Mike McCarthy and Felicity O'Dell use the information that frequency counts and collocation statistics give us to define the levels of their Vocabulary in Use series. This series is based around a 2,000-word per level benchmark, and it not only helps students learn new vocabulary, but encourages them to learn collocations, as well as highlighting more subtle areas of meaning. In the higher level books in particular, longer texts (rather than just sentences) are included. This way, learners get a better idea of the context of the words that are being presented.

Look at these sample pages from the Vocabulary in Use series

  • How do they look different from more traditional vocabulary books?
Limitations of corpus-based materials

There are of course limitations regarding the use of corpora to inform writers of language teaching materials. Different corpora are annotated and tagged in different ways. The way that different genres are identified will vary, and if frequency levels are a criterion for inclusion of a particular feature, these may vary in the way that they are calculated. The result is that different publications (e.g. different dictionaries or grammar books) may present language in different ways.

This article by (1994) Collocation: Pedagogical implications, and treatment in pedagogical materials contains a survey of a number of different teaching materials and concludes that there are a number of inconsistencies. [you need to scroll on the main site]

  • What is your reaction to this article?

NonDiscrimination Statement | Affirmative Action | Privacy Policy | Copyright Policy

© 2002-2012 CALPER and The Pennsylvania State University. All Rights Reserved.
   overview  |   background  |   applications  |   analysis  |   the classroom  |   materials  |   the future
The Pennsylvania State University CALPER South Asia Language Resource Center Center for Languages of the Central Asian Region National Capital Language Resource Center Center for Advanced Language Proficiency Education and Research National East Asian Languages Resource Center Center for Language Education and Research National African Language Resource Center National K-12 Foreign Language Resource Center Center for Advanced Research on Language Acquisition National Foreign Language Resource Center Center for Educational Resources in Culture, Language and Literacy Language Acquisition Resource Center National Heritage Language Resource Center National Middle East Language Resource Center Center for Applied Second Language Studies