Corpus |
Size |
Countries |
Time |
Genre |
IWEB |
13.9b |
6 |
2017 |
Web |
NOW |
16.2b |
20 |
2010-now |
Web: News |
CORONA |
1.58b |
20 |
2020-now |
Web: News |
GLOWBE |
1.9b |
20 |
2012-13 |
Web/blogs |
WIKI |
1.9b |
(+) |
2014 |
Wikipedia |
COCA |
1.0b |
Am |
1990-2019 |
Balanced |
COHA |
400m |
Am |
1810-2009 |
Balanced |
TV |
325m |
6 |
1950-2018 |
TV shows |
MOVIES |
200m |
6 |
1930-2018 |
Movies |
SOAP |
100m |
Am |
2001-2012 |
TV shows |
HANSARD |
1.6b |
Br |
1803-2005 |
Parliament |
EEBO |
755m |
Br |
1470s-1690s |
Various |
SUP CRT |
130m |
Am |
1790s-2010s |
Legal |
TIME |
100m |
Am |
1923-2006 |
Magazine |
BNC |
100m |
Br |
1980s-1993 |
Balanced |
CAN |
50m |
Can |
1970s-2000s |
Balanced |
CORE |
50m |
6 |
2014 |
Web |
Overview: brief /
detailed
These are the most widely used online corpora,
and they are used for many different purposes by teachers and
researchers at universities
throughout the world. In addition, the corpus data (e.g.
full-text,
word frequency) has been used by a
wide range of companies in many different fields, especially technology and
language learning.
, including tech companies like Amazon,
Google, Microsoft, IBM, Sony, Disney, Intel, Adobe, Samsung, and a very large
US-based social media company; and language-related companies like Merriam-Webster, Dictionary.com, Grammarly,
Duolingo, TurnItIn, Oxford University Press, Sketch Engine; and many more
The links below are for the
free online interface. You can also
purchase and download
the
corpora for use on your own computer.
|
|