English Corpora: most widely used online corpora. Billions of words of data: free online access

Corpus	Size	Countries	Time	Genre
IWEB	13.9b	6	2017	Web
NOW	16.2b	20	2010-now	Web: News
CORONA	1.58b	20	2020-now	Web: News
GLOWBE	1.9b	20	2012-13	Web/blogs
WIKI	1.9b	(+)	2014	Wikipedia
COCA	1.0b	Am	1990-2019	Balanced
COHA	400m	Am	1810-2009	Balanced
TV	325m	6	1950-2018	TV shows
MOVIES	200m	6	1930-2018	Movies
SOAP	100m	Am	2001-2012	TV shows
HANSARD	1.6b	Br	1803-2005	Parliament
EEBO	755m	Br	1470s-1690s	Various
SUP CRT	130m	Am	1790s-2010s	Legal
TIME	100m	Am	1923-2006	Magazine
BNC	100m	Br	1980s-1993	Balanced
CAN	50m	Can	1970s-2000s	Balanced
CORE	50m	6	2014	Web

Overview: brief | detailed

Now available: AI/LLM insights integrated into corpus results

These corpora (most of which were created by Mark Davies) are the most widely used online corpora, and they serve many different purposes for teachers and researchers at universities throughout the world. In addition, the corpus data (e.g. full-text, word frequency) has been employed by a wide range of companies in many different fields, especially technology and language learning.

The links below are for the free online interface. You can also purchase and download the corpora for use on your own computer.

Corpus	Download	# words	Dialect	Time period	Genre(s)
News on the Web (NOW)		25.0 billion+	20 countries	2010-yesterday	Web: News
iWeb: The Intelligent Web-based Corpus		14 billion	6 countries	2017	Web
Global Web-Based English (GloWbE)		1.9 billion	20 countries	2012-2013	Web (incl blogs)
Wikipedia Corpus		1.9 billion	(Various)	2014	Wikipedia
Coronavirus Corpus		1.5 billion	20 countries	2020-2023	Web: News
Corpus of Contemporary American English (COCA)		1.0 billion	American	1990-2019	Balanced
Corpus of Historical American English (COHA)		475 million	American	1820-2019	Balanced
The TV Corpus		325 million	6 countries	1950-2018	TV shows
The Movie Corpus		200 million	6 countries	1930-2018	Movies
Corpus of American Soap Operas		100 million	American	2001-2012	TV shows

Hansard Corpus		1.6 billion	British	1803-2005	Parliament
Early English Books Online (EEBO)		755 million	British	1470s-1690s	(Various)
Corpus of US Supreme Court Opinions		130 million	American	1790s-2017	Legal opinions
TIME Magazine Corpus		100 million	American	1923-2006	Magazine
British National Corpus (BNC) *		100 million	British	1980s-1993	Balanced
Strathy Corpus (Canada)		50 million	Canadian	1970s-2000s	Balanced
CORE Corpus		50 million	6 countries	2014	Web
From Google Books n-grams (compare)
American English		155 billion	American	1500s-2000s	(Various)
British English		34 billion	British	1500s-2000	(Various)

* Our architecture and interface to the BNC, which is distributed by IT Services (formerly OUCS) at Oxford University (on behalf of the BNC Consortium)

English-Corpora.org