Corpus |
Size |
Countries |
Time |
Genre |
WIKI |
1.9b |
(+) |
2014 |
Wikipedia |
NOW |
16.2b |
20 |
2010-now |
Web: News |
CORONA |
1.58b |
20 |
2020-now |
Web: News |
GLOWBE |
1.9b |
20 |
2012-13 |
Web/blogs |
TV |
325m |
6 |
1950-2018 |
TV shows |
MOVIES |
200m |
6 |
1930-2018 |
Movies |
IWEB |
13.9b |
6 |
2017 |
Web |
CORE |
50m |
6 |
2014 |
Web |
SUP CRT |
130m |
Am |
1790s-2010s |
Legal |
TIME |
100m |
Am |
1923-2006 |
Magazine |
SOAP |
100m |
Am |
2001-2012 |
TV shows |
COCA |
1.0b |
Am |
1990-2019 |
Balanced |
COHA |
400m |
Am |
1810-2009 |
Balanced |
HANSARD |
1.6b |
Br |
1803-2005 |
Parliament |
EEBO |
755m |
Br |
1470s-1690s |
Various |
BNC |
100m |
Br |
1980s-1993 |
Balanced |
CAN |
50m |
Can |
1970s-2000s |
Balanced |
These are the most widely used online corpora, and they
serve many different purposes for teachers and
researchers at universities
throughout the world. In addition, the corpus data (e.g.
full-text,
word frequency) has been employed by a
wide range of companies in many different fields, especially technology and
language learning.
(These include tech companies like Amazon,
Google, Facebook, Microsoft, IBM, Sony, Disney, Intel, Adobe, and Samsung, as well as language-related companies like Merriam-Webster, Dictionary.com, Grammarly,
Duolingo, TurnItIn, Oxford University Press, Sketch Engine; and many more.)
The links below are for the
free online interface. You can also
purchase and download
the
corpora for use on your own computer.
|
|