English-Corpora.org


The most widely used online corpora: guided tour, overview, search types, variation, virtual corpora (quick overview), BYU.

The links below are for the online interface. But you can also download the corpora for use on your own computer.

Corpus   (online access) Download # words Dialect Time period Genre(s)
iWeb: The Intelligent Web-based Corpus   14 billion 6 countries 2017 Web
News on the Web (NOW)   12.8 billion+ 20 countries 2010-yesterday Web: News
Global Web-Based English (GloWbE)   1.9 billion 20 countries 2012-13 Web (incl blogs)
Wikipedia Corpus   1.9 billion (Various) 2014 Wikipedia
Corpus of Contemporary American English (COCA)   1.0 billion American 1990-2019 Balanced
Coronavirus Corpus   1068 million+ 20 countries Jan 2020-yesterday Web: News
Corpus of Historical American English (COHA)   475 million American 1820-2019 Balanced
The TV Corpus   325 million 6 countries 1950-2018 TV shows
The Movie Corpus   200 million 6 countries 1930-2018 Movies
Corpus of American Soap Operas   100 million American 2001-2012 TV shows
           
Hansard Corpus   1.6 billion British 1803-2005 Parliament
Early English Books Online   755 million British 1470s-1690s (Various)
Corpus of US Supreme Court Opinions   130 million American 1790s-present Legal opinions
TIME Magazine Corpus   100 million American 1923-2006 Magazine
British National Corpus (BNC) *   100 million British 1980s-1993 Balanced
Strathy Corpus (Canada)   50 million Canadian 1970s-2000s Balanced
CORE Corpus   50 million 6 countries 2014 Web
From Google Books n-grams (compare)          
American English   155 billion American 1500s-2000s (Various)
British English   34 billion British 1500s-2000 (Various)
* Our architecture and interface to the BNC, which is distributed by IT Services (formerly OUCS) at Oxford University (on behalf of the BNC Consortium)