USING THE CORPORA FOR LANGUAGE LEARNING AND TEACHING
We believe that the corpora from English-Corpora.org provide
learners with much better data and many more learner-oriented
features than any other online corpora. This may be why the
corpora from English-Corpora.org (previously known as the "BYU
corpora") have served as the basis for almost every book on
corpora and language learning during the past 10-15 years (just a
small sampling:
A
B
C
D;
and see also the new 2023 book E ).
By far, the most widely used corpus for language learning is
COCA
(the Corpus of Contemporary American English). COCA is the only
corpus that is large,
recent, and
genre-balanced. Having corpora that
are genre-balanced is extremely important. This is because
language learners often do not know if a word or phrase sounds
overly formal or
informal to native speakers, and if they don't use it correctly,
they may sound strange when they write or speak in English.
To give two simple examples, a learner might use a word or
phrase like
a lot of NOUN in an academic paper (where it
would be better to use
several NOUN), or they might end up using
seldom in conversation with friends,
even though for most native speakers it sounds
old-fashioned and formal. Likewise, the meaning and usage of a
word can vary greatly between genres, such as the collocates (nearby
words) of
chair or
chain in fiction and academic
English.
A corpus that is composed mainly of
web pages
or newspapers cannot show these distinctions. The corpus needs to
have a wide range of genres, from informal (e.g. conversation and TV
and movie scripts) to formal (e.g. academic), as does COCA.
But a corpus is much more
than the sentences, paragraphs, and texts in the corpus. A truly
useful corpus also provides the end user with an interface that
really allow them to take full advantage of the underlying data. The
following are some of the features of English-Corpora.org
that make these corpora uniquely useful for language
learners:
Feature (click for info) |
Importance for language learning and teaching |
Alternative phrases |
One of the hardest things for
language learners is knowing which words sounds good
together. For example, which
synonym of potent is most common with the word
argument? One simple, fast search in COCA provides this
information. Searches like this are either not possible or
are very cumbersome and time-consuming with other online
corpora, such as
Sketch Engine or
CQPWeb. |
Word sketches |
Learners want to see rich
information on specific words (not just
collocates). At English-Corpora.org, for every one of
the top 60,000 words in a corpus, you can see the
definition, synonyms, more specific and more general words,
collocates, related topics, clusters, concordance lines,
frequency, and links to external resources like
pronunciation, images, videos, and translations for 100+
languages. |
Browse |
Learners want to search for
words, and they want to find words by frequency (so they can
see where they might have gaps in their vocabulary). At
English-Corpora.org, you can search for words by word form,
part of speech, frequency, meaning (e.g. words in a
definition), synonym, more specific or more general words,
and even pronunciation. |
Find related words |
Words are best learned as part
of a "system" of related words. For example, if learners can
relate telescope to other concepts like Earth,
Sun, star, planet, galaxy, universe, scientist, or
astronomy, they have a better chance of knowing what
telescope means, and of remembering it. Only English-Corpora.org
allows learners to find both
collocates and
related topics
(which co-occur anywhere in the text), and which
provide great insight into the meaning of a word. |
External resources |
Many language learners benefit
from multi-modal information for a given word or phrase,
such as pronunciation, images, videos, and translation to
their native language. English-Corpora.org has the only
corpora that link to so many types of external resources, in
so many useful ways. And when you're looking at Keyword in
Context (KWIC) entries for a word, there are a wide range of
"one click" resources that help you to kind information on
words that you might not already know. |
Entire texts (writing) |
As with the "alternate phrases"
section above, language learners often need help in knowing
if Phrase 1 or Phrase 2 or Phrase 3 sounds the most natural, especially in
a given genre. At English-Corpora.org, you can enter entire
texts that you have written, and then quickly and easily
highlight phrases in your text to find related phrases in
COCA, which will allow you to edit your writing to make it
sound more natural. |
Entire texts (reading) |
It might be overwhelming for a
language learner to look at a text (such as an article from
an online newspaper), when there are so many unknown words and
phrases in the text. Using COCA at
English-Corpora.org, you can find the keywords in a text (to
understand better what it's about), but also (perhaps more
importantly) click on any word or phrase in the text to see
a wide range of information, such as in the "Word Sketches"
section above. |
Virtual Corpora |
Those who are learning English
for Specific Purposes (e.g. engineering, finance, or
medicine, or even more specific applications like polymers,
mortgages, or endocrinology) want to use a corpus to quickly
and easily find the words and phrases for these fields. With
English-Corpora.org, users can create specialized corpora in
5-10 seconds, and then extract keywords in another 2-3
seconds -- far more quickly and far more easily than can be
done with other approaches like
BootCat. |
Saved words and phrases |
When language learners see a
useful word or phrase, they want to be able to save that
word or phrase, and perhaps assign them to particular
categories. This is quick and easy at English-Corpora.org. |
|