The Classroom

Non-English corpora

In the previous unit we explored ways that teachers could use corpora to enhance their teaching. The examples that we looked at were all taken from the area of teaching English to speakers of other languages. In this section, we will be looking at the experiences of teachers who teach languages other than English. While it is true that far more work involving corpora has been done in relation to teaching English, there are a growing number of teachers and researchers all over the world who are using corpora covering a wide range of languages.

Parallel corpora

When a multilingual corpus consists of texts in two or more languages, it is often the case that there is an original monolingual text and its translation. The fact that the original text has been translated means that there are issues to take into consideration concerning the faithfulness and accuracy of the translation, the effect of lingua-cultural differences and the size of the chunks being translated. This kind of corpus is known as a parallel corpus

A good review of the use of parallel bilingual texts can be found in John Nerbonne's Parallel Texts in Computer-Assisted Language Learning [direct link to PDF], which is Chapter 18 of Jean Veronis (ed.) Parallel Text Processing Kluwer, Dordrecht and Boston. 2000. pp.354-369. Nerbonne, of the University of Groningen (The Netherlands) surveys the challenges of this kind of research.

Comparable corpora

In addition to parallel corpora, there are also comparable corpora. These corpora are also made up of texts in two or more languages. In this case, however, one set of texts are not translations of the other. Comparable texts are those which, though composed independently in their respective language communities, have the same communicative function (Laffling, 1992). For example, two newspaper stories about the same event written in two different languages would be comparable texts. However, the compilation and handling of comparable corpora is not straightforward, particularly because of political and social differences between the language communities.

Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian [direct link to PDF] by Bekavac, Osenova, Simov and Tadic is an interesting article which investigates some of the difficulties of assembling these kinds of corpora. However, the article is not written for teachers, but for those interested in NLP (Natural Language Processing). It does make some interesting points though.

Comparable corpus techniques have also been used by Peters and Picchi at the University of Pisa. Again, this research is not connected directly with teaching but with cross-language querying in digital libraries. However, section four of their article Across Languages, Across Cultures: Issues in Multilinguality and Digital Libraries makes interesting reading for people interested in corpus-based strategies and comparable corpora.

NonDiscrimination Statement | Affirmative Action | Privacy Policy | Copyright Policy

© 2002-2012 CALPER and The Pennsylvania State University. All Rights Reserved.
   overview  |   background  |   applications  |   analysis  |   the classroom  |   materials  |   the future
The Pennsylvania State University CALPER South Asia Language Resource Center Center for Languages of the Central Asian Region National Capital Language Resource Center Center for Advanced Language Proficiency Education and Research National East Asian Languages Resource Center Center for Language Education and Research National African Language Resource Center National K-12 Foreign Language Resource Center Center for Advanced Research on Language Acquisition National Foreign Language Resource Center Center for Educational Resources in Culture, Language and Literacy Language Acquisition Resource Center National Heritage Language Resource Center National Middle East Language Resource Center Center for Applied Second Language Studies