English-Corpora.org *

The most widely used online corpora. Overview, search types, looking at variation, corpus-based resources.

The links below are for the online interface. But you can also download the corpora for use on your own computer.

Corpus Download # words Dialect Time period Genre(s)
iWeb: The Intelligent Web-based Corpus   14 billion 6 countries 2017 Web
News on the Web (NOW)   8.6 billion+ 20 countries 2010-last month Web: News
Global Web-Based English (GloWbE)   1.9 billion 20 countries 2012-13 Web (incl blogs)
Wikipedia Corpus   1.9 billion (Various) 2014 Wikipedia
Corpus of Contemporary American English (COCA)   560 million American 1990-2017 Balanced
Corpus of Historical American English (COHA)   400 million American 1810-2009 Balanced
The TV Corpus   325 million 6 countries 1950-2018 TV shows
The Movie Corpus   200 million 6 countries 1930-2018 Movies
Corpus of American Soap Operas   100 million American 2001-2012 TV shows
Hansard Corpus   1.6 billion British 1803-2005 Parliament
Early English Books Online   755 million British 1470s-1690s (Various)
Corpus of US Supreme Court Opinions   130 million American 1790s-present Legal opinions
TIME Magazine Corpus   100 million American 1923-2006 Magazine
British National Corpus (BNC) *   100 million British 1980s-1993 Balanced
Strathy Corpus (Canada)   50 million Canadian 1970s-2000s Balanced
CORE Corpus   50 million 6 countries 2014 Web