English-Corpora.org *

In addition to the regular corpus interface, there are a wide range of other corpus-based resources, some of which allow you to download large amounts of data for offline use. (Compare to academic license)

iWeb Nearly all of the resources below are for COCA and other "smaller" corpora (e.g. 100-500 million words in size). In May 2018 we released the 14 billion word iWeb corpus, which has its own full-text, word frequency, collocates, and n-grams data.

Download full-text data for iWeb (22 million texts), COCA (190,000 texts), COHA (115,000 texts), GloWbE (1,800,000 texts), NOW (4,000,000+ texts), NOW updates (about 250,000 texts each month), Wikipedia (4,400,000 texts), or the Corpus del Espaņol (2,000,000 texts). With this data, you will have the texts from the corpora on your own computer, rather than having to use the web interface. The data comes in three formats: relational database, word/lemma/PoS (vertical format), or text (linear format).

Word and Phrase
(analyze texts)

Enter entire texts and see detailed frequency information on the words in the text, and create word lists based on your text. Click through the words to see detailed information on any word. Highlight phrases in your text and have it search for related phrases in COCA.

Word and Phrase
(frequency lists)

Search and browse the most complete frequency dictionary of English. See detailed information (all on one page) -- definition, frequency by genre, collocates (nearby words), concordance lines, synonyms, and Wordnet-related words, all with useful links from one resource to another.

Word Frequency
100,000 list

Download free lists, including the top 5000 lemmas. You can also download other lists, which show the frequency of the top 60,000 lemmas by genre (and sub-genre). You can also download a 100,000 integrated word list from COCA, COHA, BNC, and SOAP -- the largest, corrected frequency list of English.


Download lists with the top 200-300 collocates (nearby words) for 60,000 different lemmas -- 4,300,000 node/collocate pairs in all.


Download free lists containing the top 1,000,000 2-grams (two word sequences), 3-grams, 4-grams, and 5-grams in COCA. There are also other lists that contain the frequency of all 2, 3, and 4-grams (up to 155 million rows of data).

Academic Vocabulary

Download free lists from the 120 million words of COCA-Academic texts, including academic words grouped by word families, lists of "core" academic English, and "technical" word lists for the nine domains of COCA-Academic (e.g. Law, Medicine, or Business).

Word and Phrase

Similar to the two resources below, but limited strictly to the 120 million words of COCA-Academic. Get detailed information on words and phrases, frequency by sub-genre, and concordances and collocates in just the academic genre. Also, analyze entire academic texts that you input.