English-Corpora.org


The most widely used online corpora. Overview, search types, looking at variation, using Virtual Corpora, corpus-based resources.

The links below are for the online interface. But you can also download the corpora for use on your own computer.

Corpus   (online access) Download # words Dialect Time period Genre(s)
iWeb: The Intelligent Web-based Corpus   14 billion 6 countries 2017 Web
News on the Web (NOW)   11.2 billion+ 20 countries 2010-yesterday Web: News
Global Web-Based English (GloWbE)   1.9 billion 20 countries 2012-13 Web (incl blogs)
Wikipedia Corpus   1.9 billion (Various) 2014 Wikipedia
Corpus of Contemporary American English (COCA)   1.0 billion American 1990-2019 Balanced
Coronavirus Corpus   666 million+ 20 countries Jan 2020-yesterday Web: News
Corpus of Historical American English (COHA)   400 million American 1810-2009 Balanced
The TV Corpus   325 million 6 countries 1950-2018 TV shows
The Movie Corpus   200 million 6 countries 1930-2018 Movies
Corpus of American Soap Operas   100 million American 2001-2012 TV shows
           
Hansard Corpus   1.6 billion British 1803-2005 Parliament
Early English Books Online   755 million British 1470s-1690s (Various)
Corpus of US Supreme Court Opinions   130 million American 1790s-present Legal opinions
TIME Magazine Corpus   100 million American 1923-2006 Magazine
British National Corpus (BNC) *   100 million British 1980s-1993 Balanced
Strathy Corpus (Canada)   50 million Canadian 1970s-2000s Balanced
CORE Corpus   50 million 6 countries 2014 Web
From Google Books n-grams (compare)          
American English   155 billion American 1500s-2000s (Various)
British English   34 billion British 1500s-2000 (Various)
* Our architecture and interface to the BNC, which is distributed by IT Services (formerly OUCS) at Oxford University (on behalf of the BNC Consortium)