()   

 

Now available! At English-Corpora.org, we've introduced a new way to interact with corpus data. Using Large Language Models (LLMs) like GPT, Gemini, and Claude, users can now have collocates, phrases, and frequency data clustered, categorized, and explained automatically. The underlying corpus data remains unchanged — but AI provides an optional layer of analysis to help users spot patterns and connections more quickly.

Just click on any of the green links on a corpus results page to use these features.   [Sample searches | Get started]   [More]

The Corpus of Contemporary American English (COCA) was created by Mark Davies, and it is the only large and "balanced" corpus of American English. COCA is probably the most widely-used corpus of English, and it is related to other corpora from English-Corpora.org, which offer unparalleled insight into variation in English.

The corpus contains more than one billion words of text (25+ million words each year 1990-2019) from eight genres: spoken, fiction, popular magazines, newspapers, academic texts, TV and movies subtitles, blogs, and other web pages.

Click on any of the links in the search form on the search page for context-sensitive help, and to see the range of queries that the corpus offers. And you might want to check out the new expanded help files.

There are six main ways to search the corpus:

1. Search for phrases and strings. And because the corpus is optimized for speed, searches for substrings (*ism, un*able) and phrases are very fast, e.g.: got VERB-ed, BUY * ADJ NOUN, "gorgeous" NOUN -- and even high frequency phrases like: from ADJ to ADJ, phrasal verbs, or NOUN NOUN.

2. Browse a frequency list of the top 60,000 words in the corpus, including searches by word form, part of speech, ranges in the 60,000 word list, and even by meaning or pronunciation. This should be particularly useful for language learners and teachers.

3. Browse through the Academic Vocabulary List (AVL) (Gardner and Davies, 2013), and then see detailed entries for each of the 3,000 words. This is a great option for those who are interested mainly in academic English.

4. Search by individual word, and see collocates, topics, clusters, websites, concordance lines, and related words for each of these words. Note that some of these searches are unique to COCA and iWeb.

5. Input entire texts and then use data from COCA to get detailed information on the words and phrases in the text.

6. Find random words and also browse through randomly-selected "Words of the Day", and then save new words and come back and review them later.

You might pay special attention to the comparisons between genres and years and virtual corpora, which allow you to create personalized collections of texts related to a particular area of interest.