()   

 

Now available! At English-Corpora.org, we're introducing a new way to interact with corpus data. Using Large Language Models (LLMs) like GPT, Gemini, and Claude, users can now have collocates, phrases, and frequency data clustered, categorized, and explained automatically. The underlying corpus data remains unchanged — but AI provides an optional layer of analysis to help users spot patterns and connections more quickly.

Just click on any of the green links on a corpus results page to use these features.   [Sample searches | Get started]   [More]

The Corpus of Historical American English (COHA) was created by Mark Davies, and it is the largest structured corpus of historical English. It is related to other corpora from English-Corpora.org, which are the most widely used corpora of English and which offer unparalleled insight into variation in English. If you are interested in historical corpora, you might also look at our Google Books (see comparison), Hansard, and TIME corpora.

COHA contains more than 475 million words of text from the 1820s-2010s (which makes it 50-100 times as large as other comparable historical corpora of English) and the corpus is balanced by genre decade by decade. The creation of the corpus results from a grant from the National Endowment for the Humanities (NEH) from 2008-2010.

Click on any of the links in the search form on the search page for context-sensitive help, or take a look at the expanded overview of the corpus. You might pay special attention to the comparisons between decades and virtual corpora, which allow you to create personalized collections of texts related to a particular area of interest. And you might want to check out the new expanded help files.

 Overview (6m 30s)
Five minute tour
Articles on using COHA for research
Compare to Google Books
Compare to small corpora (ARCHER, Brown family, etc)