2021-2024 |
Added several detailed instructional videos: overview,
language learning and teaching,
word sketches,
browsing words,
analyze texts,
search history,
customized word lists,
saved words (favorites),
KWIC lines: limiting and sorting,
saved KWIC lines,
analyze KWIC lines,
external resources,
Virtual Corpora,
examining recent change. |
2021-2024 |
Added several detailed PDF help files:
overview / guided tour,
architecture,
association measures,
collocates (cf Sketch Engine),
topics (and collocates),
word sketches,
browsing words,
analyzing texts,
KWIC -> analyze text,
saved words and phrases,
saving KWIC entries,
customized word lists,
search history,
external resources,
monitor corpus,
Virtual Corpora,
Virtual
Corpora: quick overview
|
2021-2024 |
Added additional functionality to
corpora: ability to see the number of texts (in addition to
frequency), case sensitive search, etc. Also, many improvements
in terms of speed (although the corpora were
already the
fastest structured corpora in the world -- 5-10 times as
fast as other corpora like Sketch Engine. |
2021-2024 |
Added more than 8 billion words of
new data for the
NOW Corpus |
2021. Jun |
Integrated the
Academic Vocabulary List into COCA |
2020. Aug |
COCA now allows users to analyze entire texts (e.g. student
compositions or online newspapers), and then see detailed data
from COCA for the words and phrases in their text. |
2020. May |
Released the
Coronavirus Corpus,
which currently contains billion words of data, and which is
growing by 80-100 million words each month. |
2020. Apr |
The frequency-based data from all of the corpora is now linked
to a wide range of external resources, including searches of the
web, images, videos, books, and translations |
2020. Mar |
Released new (and final) version of COCA.
One billion words; nearly twice the size
as before; texts through Dec 2019; new genres (web, blog,
TV/movies), and many new word-oriented features (like iWeb) |
2019. Mar |
Moved the corpora off-campus (from corpus.byu.edu to
www.english-corpora.org) in order to ensure their long-term
survival. The corpora have the same functionality as before, and in fact they are
even a little bit faster at this new website. |
2019. Feb |
TV corpus: 325 million
words in 75,000 very informal episodes (e.g. comedies and
dramas) from 1950-2018.
Movie corpus: 200 million words in 25,000 movies from
1930-2018. By far the most informal of all of the corpora from English-Corpora.org. |
2018. May |
14 billion word iWeb
("Intelligent Web") corpus. Unlike other large
corpora of English, this one allows much more intelligent
website-based searches, as well as in-depth information on the
top 60,000 words in the corpus. |
2017. Oct |
Released the
Early English Books
Online (EEBO) corpus, which contains 755 million words in
more than 25,000 texts from the 1470s to the 1690s. |
2017. Sep |
All of the corpora and the corpus
portal (as well as corpus-based resources) now available with secure
HTTPS connection |
2017. Feb |
Released the
US Supreme Court corpus,
which contains 130 million words in US Supreme Court opinions
during the last 200 years. |
2016. May |
Released a major update to the corpus
interface, which works great on mobile devices and which
allows the use of "virtual corpora" |
2016. May |
Released the
NOW corpus, which
automatically adds about 180-200 million words of data every
month. |
2016. May |
Released the
CORE corpus, which is
the first corpus of web pages (about 50 million words of data)
that are carefully tagged for register (personal blog, advice,
interviews, etc) |
2015. Jul |
Released the
Hansard corpus,
which is based on 1.6 billion words in 7.6 million speeches from
the
British Parliament, 1803-2005. |
2015. Jan |
Released the
Wikipedia corpus,
which is based on 1.9 billion words in 4.4 million articles
from Wikipedia. |
2014. Mar |
Released full-text
versions of COCA and GloWbE, which allow users to search the
downloaded texts on their own computer |
2013. Aug |
Released
www.academicvocabulary.info; free downloadable lists for academic
English: word families, core academic, and genre-specific
technical words |
2013. Aug |
Released www.wordandphrase.info/academic:
same interface as the WordAndPhrase resources below, but for just for COCA-Academic |
2013. Apr |
Released the
Corpus of Global
Web-Based English (GloWbE) (1.9 billion words,
2012-13) |
2013. Jan |
Released the
Strathy Corpus (Canadian
English) (50 million words, ~1970s-2000s) |
2012. Aug |
Created ability to compare results
from different corpora (side by side) within the web interface,
e.g. COCA and BNC |
2012. Aug |
Update the
British National Corpus
with the CLAWS 7 tagset; inclusion of speech indicators, XML
World Edition |
2012. Jul |
Released the
Corpus of American Soap
Operas (100 million words, 2001-2012) |
2012. Jul |
Added the following datasets to the
Google Books corpora:
British English (34 billion words), Fiction (91 billion), One
Million Books (89 billion), Spanish (45 billion) |
2012. Jun |
Added about 25 million words to the
Corpus of Contemporary
American English (COCA), for Apr 2011 - Jun 2011. |
2012. Feb |
Modified
www.wordandphrase.info: ability to enter entire texts and
then see detailed information about words and phrases |
2012. Jan |
Released
www.wordandphrase.info: integrated frequency and genre data,
definitions, collocates, concordances, synonyms, and WordNet |
2011. Dec |
Released free n-grams lists
for COCA and COHA; millions of rows of data for 2-grams (two
word sequences), 3-grams, 4-grams, and 5-grams. |
2011. May |
Released beta version of the
Google Books
(American English) Corpus (155 billion words,
1810-2009) |
2011. Apr |
Added about 15 million words to the
Corpus of Contemporary
American English (COCA), for July 2010 - Mar 2011. |
2011. Feb |
Added concordance view |
2010. Oct |
Improved functionality for
interaction with other users (see queries, researchers,
publications) and ability to save and manipulate Keyword in
Context entries. |
2010. Sep |
Released beta version of the
Corpus of Historical
American English (COHA) |
2010. Aug |
Added about 20 million words to the
Corpus of Contemporary
American English (COCA), for July 2009 - June 2010. |
2010. Feb |
Released the
frequency
lists and dictionary that are based on the Corpus of
Contemporary American English. |
2009. Aug |
Added about 15 million words to the
Corpus of Contemporary
American English (COCA), for October 2008 - June 2009. |
2009. May |
Added new tools for collaboration:
links to previous queries (including annotations/notes) and
ability to share them
with others |
2008. Oct |
Added about 15 million words to the
Corpus of Contemporary
American English (COCA), for Jan-Sep 2008. |
2008. Jun |
Applied the new architecture to the
Corpus do Português |
2008. Apr |
Applied the new architecture to the
British National Corpus
and the TIME Corpus |
2008. Mar |
Released the
Corpus of Contemporary
American English |
2007. Oct |
Finished new (current) corpus architecture;
applied it to the
Corpus del Español. Major updates in this corpus as well,
including much-improved tagging and lemmatization for Modern
Spanish. |
2007. May |
Released the
TIME Corpus of American
English |
2006. Aug |
Released the
Corpus do Português |
2005. Apr |
Interface for
Register
Variation in Spanish |
2004. Apr |
Released VIEW, our first version of the
British National Corpus |
2002. Sep |
Released the first version of the
Corpus del Español |