Software and technical resources

Antconc is an easy-to-use free concordance; compiles KWIC (key words in contexts), word clusters, n-grams, word frequencies, etc.; downloadable from The website also contains tutorials on using the concordance.

WordSmith Tools is lexical analysis software, an integrated suite of programs that compile KWIC (key words in contexts), word clusters, n-grams, p-frames, word frequencies, etc., only available for PC; per pay, however, demo version available free of charge

MonoConc Pro is a concordance program that provides KWIC concordance results, wordlists and collocation information; comes with a range of features such as Context Search, Regular Expression search, Part-of-Speech Tag Search, Collocations, and Corpus Comparison; per pay; available here

MonoConc Easy has many of the features of MonoConc Pro, minus some of the advanced features such as Advanced Sort and Corpus Comparison; intuitive interface; best for student use and teaching rather than for corpus research; per pay; available here

ParaConc is multilingual concordance program for parallel texts (translations); analyzes up to four languages in parallel; includes collocation tables, word frequency lists, collocation lists, regular expression search, as well as a parallel search option and translation and "hot words" utilities; per pay; available here

Compleat Lexical Tutor website provides a range of tools for text analyses.

Simple Concordance Program (SCP) is a concordance and word listing program; creates word lists and search natural language text files for words, phrases, and patterns; available for free here

The Sketch Engine by Adam Kilgarriff and Pavel Rychly is a corpus search engine incorporating word sketches, grammatical relations, and a distributional thesaurus; per pay, free demo account is available after registration here

Custom List Analyzer (CLA) is a simple but powerful text analysis tool that allows users to create analyze texts using their own list dictionaries; list dictionaries can be of unlimited length and can consist of words, words with wildcards, and n-grams; available here

Sentiment Analysis and Cognition Engine (SEANCE) is a tool for sentiment analysis; includes 254 core indices and 20 component indices; allows for a number of customized indices including filtering for particular parts of speech and controlling for instances of negation; available here

The Simple Natural Language Processing Tool (SiNLP) is a simple tool that allows users to analyze texts using their own custom dictionaries; provides the name of each text processed, the number of words, number of types, TTR, Letters per word, number paragraphs, number of sentences, and number of words per sentence for each text; available here

Tool for the Automatic Analysis of Cohesion (TAACO) is a tool that calculates 150 indices of both local and global cohesion, including a number of type-token ratio indices (e.g. parts of speech, lemmas, bigrams, trigrams, etc.), adjacent overlap indices, and connectives indices; available here

Tool for the Automatic Analysis of Lexical Sophistication (TAALES) is a tool that measures 135 different indices of lexical sophistication, including indices of frequency, range, academic language, and psycholinguistic word information. Included are indices for both single words and n-grams. TAALES indices have been used to inform models of second language (L2) speaking proficiency, first language (L1) and L2 writing proficiency, genre differences, and satirical language.

Lexical Complexity Analyzer is designed to automate lexical complexity analysis of English texts using 25 different measures of lexical density, variation and sophistication proposed in the first and second language development literature; available here

L2 Syntactic Complexity Analyzer is designed to automate syntactic complexity analysis of written English language samples produced by advanced learners of English using fourteen different measures proposed in the second language development literature; available here

IntelliText is a web-based corpus tool run by the Centre for Translation Studies, University of Leeds; allows access to monolingual and bilingual corpora for various languages; includes a “Build Your Own Corpus” function that allows users to create and annotate their own corpora; freely available for download or for use on the server

Constituent Likelihood Automatic Word-tagging System (CLAWS) is a POS tagger developed by UCREL at Lancaster University. The latest version of the tagger, CLAWS4, was used to POS tag c.100 million words of the British National Corpus; consistently achieves 96-97% accuracy; CLAWS can be accessed through the web-based Wmatrix interface; per pay.

NonDiscrimination Statement | Affirmative Action | Privacy Policy | Copyright Policy

© 2002-2012 CALPER and The Pennsylvania State University. All Rights Reserved.
   Corpus Searchable Bibliography  |   Journals  |   Corpora  |   Research  |   Organizations  |   Software
The Pennsylvania State University CALPER South Asia Language Resource Center Center for Languages of the Central Asian Region National Capital Language Resource Center Center for Advanced Language Proficiency Education and Research National East Asian Languages Resource Center Center for Language Education and Research National African Language Resource Center National K-12 Foreign Language Resource Center Center for Advanced Research on Language Acquisition National Foreign Language Resource Center Center for Educational Resources in Culture, Language and Literacy Language Acquisition Resource Center National Heritage Language Resource Center National Middle East Language Resource Center Center for Applied Second Language Studies