English-Corpora.org *


The most widely used online corpora. Overview, search types, looking at variation, corpus-based resources.

The links below are for the online interface. But you can also download the corpora for use on your own computer.

Corpus   (online access) Download # words Dialect Time period Genre(s)
iWeb: The Intelligent Web-based Corpus   14 billion 6 countries 2017 Web
News on the Web (NOW)   8.7 billion+ 20 countries 2010-last month Web: News
Global Web-Based English (GloWbE)   1.9 billion 20 countries 2012-13 Web (incl blogs)
Wikipedia Corpus   1.9 billion (Various) 2014 Wikipedia
Corpus of Contemporary American English (COCA)   560 million American 1990-2017 Balanced
Corpus of Historical American English (COHA)   400 million American 1810-2009 Balanced
The TV Corpus   325 million 6 countries 1950-2018 TV shows
The Movie Corpus   200 million 6 countries 1930-2018 Movies
Corpus of American Soap Operas   100 million American 2001-2012 TV shows
           
Hansard Corpus   1.6 billion British 1803-2005 Parliament
Early English Books Online   755 million British 1470s-1690s (Various)
Corpus of US Supreme Court Opinions   130 million American 1790s-present Legal opinions
TIME Magazine Corpus   100 million American 1923-2006 Magazine
British National Corpus (BNC) *   100 million British 1980s-1993 Balanced
Strathy Corpus (Canada)   50 million Canadian 1970s-2000s Balanced
CORE Corpus   50 million 6 countries 2014 Web
From Google Books n-grams (compare)          
American English
 
 
  155 billion American 1500s-2000s (Various)
British English   34 billion British 1500s-2000 (Various)
* Our architecture and interface to the BNC, which is distributed by IT Services (formerly OUCS) at Oxford University (on behalf of the BNC Consortium)