Corpus |
Size |
Countries |
Time |
Genre |
COCA |
1.0b |
Am |
1990-2019 |
Balanced |
COHA |
400m |
Am |
1810-2009 |
Balanced |
BNC |
100m |
Br |
1980s-1993 |
Balanced |
CAN |
50m |
Can |
1970s-2000s |
Balanced |
SUP CRT |
130m |
Am |
1790s-2010s |
Legal |
TIME |
100m |
Am |
1923-2006 |
Magazine |
MOVIES |
200m |
6 |
1930-2018 |
Movies |
HANSARD |
1.6b |
Br |
1803-2005 |
Parliament |
SOAP |
100m |
Am |
2001-2012 |
TV shows |
TV |
325m |
6 |
1950-2018 |
TV shows |
EEBO |
755m |
Br |
1470s-1690s |
Various |
CORE |
50m |
6 |
2014 |
Web |
IWEB |
13.9b |
6 |
2017 |
Web |
GLOWBE |
1.9b |
20 |
2012-13 |
Web/blogs |
NOW |
15.4b |
20 |
2010-now |
Web: News |
CORONA |
1.48b |
20 |
2020-now |
Web: News |
WIKI |
1.9b |
(+) |
2014 |
Wikipedia |
These are the most widely used
online corpora. They are used extensively at
universities throughout the
world, and researchers have used the corpora for
thousands of articles, especially to look at variation
in English. In addition, the corpus data (e.g.
full-text,
word frequency) has been used by a
wide range of companies throughout the world
, including tech companies like Amazon,
Google, Microsoft, IBM, Sony, Disney, Intel, Adobe, Samsung, and a very large
US-based social media company; and language-related companies like Merriam-Webster, Dictionary.com, Grammarly,
Duolingo, TurnItIn, Oxford University Press, Sketch Engine; and many more.
The links below are for the
free online interface. You can also
purchase and download
the
corpora for use on your own computer.
|
|