2020. Aug |
COCA now allows users to analyze entire texts (e.g. student
compositions or online newspapers), and then see detailed data
from COCA for the words and phrases in their text. |
2020. May |
Released the
Coronavirus Corpus,
which currently contains billion words of data, and which is
growing by 80-100 million words each month. |
2020. Apr |
The frequency-based data from all of the corpora is now linked
to a wide range of external resources, including searches of the
web, images, videos, books, and translations |
2020. Mar |
Released new (and final) version of COCA.
One billion words; nearly twice the size
as before; texts through Dec 2019; new genres (web, blog,
TV/movies), and many new word-oriented features (like iWeb) |
2019. Mar |
Moved the corpora off-campus (from corpus.byu.edu to
www.english-corpora.org) in order to ensure their long-term
survival. The corpora have the same functionality as before, and in fact they are
even a little bit faster at this new website. |
2019. Feb |
TV corpus: 325 million
words in 75,000 very informal episodes (e.g. comedies and
dramas) from 1950-2018.
Movie corpus: 200 million words in 25,000 movies from
1930-2018. By far the most informal of all of the corpora from English-Corpora.org. |
2018. May |
14 billion word iWeb
("Intelligent Web") corpus. Unlike other large
corpora of English, this one allows much more intelligent
website-based searches, as well as in-depth information on the
top 60,000 words in the corpus. |
2017. Oct |
Released the
Early English Books
Online (EEBO) corpus, which contains 755 million words in
more than 25,000 texts from the 1470s to the 1690s. |
2017. Sep |
All of the corpora and the corpus
portal (as well as corpus-based resources) now available with secure
HTTPS connection |
2017. Feb |
Released the
US Supreme Court corpus,
which contains 130 million words in US Supreme Court opinions
during the last 200 years. |
2016. May |
Released a major update to the corpus
interface, which works great on mobile devices and which
allows the use of "virtual corpora" |
2016. May |
Released the
NOW corpus, which
automatically adds about 180-200 million words of data every
month. |
2016. May |
Released the
CORE corpus, which is
the first corpus of web pages (about 50 million words of data)
that are carefully tagged for register (personal blog, advice,
interviews, etc) |
2015. Jul |
Released the
Hansard corpus,
which is based on 1.6 billion words in 7.6 million speeches from
the
British Parliament, 1803-2005. |
2015. Jan |
Released the
Wikipedia corpus,
which is based on 1.9 billion words in 4.4 million articles
from Wikipedia. |
2014. Mar |
Released full-text
versions of COCA and GloWbE, which allow users to search the
downloaded texts on their own computer |
2013. Aug |
Released
www.academicvocabulary.info; free downloadable lists for academic
English: word families, core academic, and genre-specific
technical words |
2013. Aug |
Released www.wordandphrase.info/academic:
same interface as the WordAndPhrase resources below, but for just for COCA-Academic |
2013. Apr |
Released the
Corpus of Global
Web-Based English (GloWbE) (1.9 billion words,
2012-13) |
2013. Jan |
Released the
Strathy Corpus (Canadian
English) (50 million words, ~1970s-2000s) |
2012. Aug |
Created ability to compare results
from different corpora (side by side) within the web interface,
e.g. COCA and BNC |
2012. Aug |
Update the
British National Corpus
with the CLAWS 7 tagset; inclusion of speech indicators, XML
World Edition |
2012. Jul |
Released the
Corpus of American Soap
Operas (100 million words, 2001-2012) |
2012. Jul |
Added the following datasets to the
Google Books corpora:
British English (34 billion words), Fiction (91 billion), One
Million Books (89 billion), Spanish (45 billion) |
2012. Jun |
Added about 25 million words to the
Corpus of Contemporary
American English (COCA), for Apr 2011 - Jun 2011. |
2012. Feb |
Modified
www.wordandphrase.info: ability to enter entire texts and
then see detailed information about words and phrases |
2012. Jan |
Released
www.wordandphrase.info: integrated frequency and genre data,
definitions, collocates, concordances, synonyms, and WordNet |
2011. Dec |
Released free n-grams lists
for COCA and COHA; millions of rows of data for 2-grams (two
word sequences), 3-grams, 4-grams, and 5-grams. |
2011. May |
Released beta version of the
Google Books
(American English) Corpus (155 billion words,
1810-2009) |
2011. Apr |
Added about 15 million words to the
Corpus of Contemporary
American English (COCA), for July 2010 - Mar 2011. |
2011. Feb |
Added concordance view |
2010. Oct |
Improved functionality for
interaction with other users (see queries, researchers,
publications) and ability to save and manipulate Keyword in
Context entries. |
2010. Sep |
Released beta version of the
Corpus of Historical
American English (COHA) |
2010. Aug |
Added about 20 million words to the
Corpus of Contemporary
American English (COCA), for July 2009 - June 2010. |
2010. Feb |
Released the
frequency
lists and dictionary that are based on the Corpus of
Contemporary American English. |
2009. Aug |
Added about 15 million words to the
Corpus of Contemporary
American English (COCA), for October 2008 - June 2009. |
2009. May |
Added new tools for collaboration:
links to previous queries (including annotations/notes) and
ability to share them
with others |
2008. Oct |
Added about 15 million words to the
Corpus of Contemporary
American English (COCA), for Jan-Sep 2008. |
2008. Jun |
Applied the new architecture to the
Corpus do Português |
2008. Apr |
Applied the new architecture to the
British National Corpus
and the TIME Corpus |
2008. Mar |
Released the
Corpus of Contemporary
American English |
2007. Oct |
Finished new (current) corpus architecture;
applied it to the
Corpus del Español. Major updates in this corpus as well,
including much-improved tagging and lemmatization for Modern
Spanish. |
2007. May |
Released the
TIME Corpus of American
English |
2006. Aug |
Released the
Corpus do Português |
2005. Apr |
Interface for
Register
Variation in Spanish |
2004. Apr |
Released VIEW, our first version of the
British National Corpus |
2002. Sep |
Released the first version of the
Corpus del Español |