Corpus |
Size |
Countries |
Time |
Genre |
IWEB |
13.9b |
6 |
2017 |
Web |
NOW |
16.2b |
20 |
2010-now |
Web: News |
CORONA |
1.58b |
20 |
2020-now |
Web: News |
GLOWBE |
1.9b |
20 |
2012-13 |
Web/blogs |
WIKI |
1.9b |
(+) |
2014 |
Wikipedia |
COCA |
1.0b |
Am |
1990-2019 |
Balanced |
COHA |
400m |
Am |
1810-2009 |
Balanced |
TV |
325m |
6 |
1950-2018 |
TV shows |
MOVIES |
200m |
6 |
1930-2018 |
Movies |
SOAP |
100m |
Am |
2001-2012 |
TV shows |
HANSARD |
1.6b |
Br |
1803-2005 |
Parliament |
EEBO |
755m |
Br |
1470s-1690s |
Various |
SUP CRT |
130m |
Am |
1790s-2010s |
Legal |
TIME |
100m |
Am |
1923-2006 |
Magazine |
BNC |
100m |
Br |
1980s-1993 |
Balanced |
CAN |
50m |
Can |
1970s-2000s |
Balanced |
CORE |
50m |
6 |
2014 |
Web |
These are the most widely used online corpora, and they are used for many different purposes by teachers and
researchers at universities
throughout the world. In addition, the corpus data (e.g.
full-text,
word frequency) has been used by a
wide range of companies in many different fields, especially technology and
language learning.
(These include tech companies like Amazon,
Google, Facebook, Microsoft, IBM, Sony, Disney, Intel, Adobe, and Samsung, as well as language-related companies like Merriam-Webster, Dictionary.com, Grammarly,
Duolingo, TurnItIn, Oxford University Press, Sketch Engine; and many more.)
The links below are for the
free online interface. You can also
purchase and download
the
corpora for use on your own computer.
|
|