English-Corpora.org

English-Corpora.org


USING THE CORPORA FOR LANGUAGE LEARNING AND TEACHING  

We believe that the corpora from English-Corpora.org provide learners with much better data and many more learner-oriented features than any other online corpora. This may be why the corpora from English-Corpora.org (previously known as the "BYU corpora") have served as the basis for almost every book on corpora and language learning during the past 10-15 years (just a small sampling: A B C D; and see also the new 2023 book E ).


By far, the most widely used corpus for language learning is COCA (the Corpus of Contemporary American English). COCA is the only corpus that is large, recent, and genre-balanced. Having corpora that are genre-balanced is extremely important. This is because language learners often do not know if a word or phrase sounds overly formal or informal to native speakers, and if they don't use it correctly, they may sound strange when they write or speak in English.

To give two simple examples, a learner might use a word or phrase like a lot of NOUN in an academic paper (where it would be better to use several NOUN), or they might end up using seldom in conversation with friends, even though for most native speakers it sounds old-fashioned and formal. Likewise, the meaning and usage of a word can vary greatly between genres, such as the collocates (nearby words) of chair or chain in fiction and academic English.

A corpus that is composed mainly of web pages or newspapers cannot show these distinctions. The corpus needs to have a wide range of genres, from informal (e.g. conversation and TV and movie scripts) to formal (e.g. academic), as does COCA.


But a corpus is much more than the sentences, paragraphs, and texts in the corpus. A truly useful corpus also provides the end user with an interface that really allow them to take full advantage of the underlying data. The following are some of the features of English-Corpora.org that make these corpora uniquely useful for language learners:

Feature (click for info) Importance for language learning and teaching

Alternative phrases

One of the hardest things for language learners is knowing which words sounds good together. For example, which synonym of potent is most common with the word argument? One simple, fast search in COCA provides this information. Searches like this are either not possible or are very cumbersome and time-consuming with other online corpora, such as Sketch Engine or CQPWeb.

Word sketches

Learners want to see rich information on specific words (not just collocates). At English-Corpora.org, for every one of the top 60,000 words in a corpus, you can see the definition, synonyms, more specific and more general words, collocates, related topics, clusters, concordance lines, frequency, and links to external resources like pronunciation, images, videos, and translations for 100+ languages.

Browse

Learners want to search for words, and they want to find words by frequency (so they can see where they might have gaps in their vocabulary). At English-Corpora.org, you can search for words by word form, part of speech, frequency, meaning (e.g. words in a definition), synonym, more specific or more general words, and even pronunciation.

Find related words

Words are best learned as part of a "system" of related words. For example, if learners can relate telescope to other concepts like Earth, Sun, star, planet, galaxy, universe, scientist, or astronomy, they have a better chance of knowing what telescope means, and of remembering it. Only English-Corpora.org allows learners to find both collocates and related topics (which co-occur anywhere in the text), and which provide great insight into the meaning of a word.

External resources

Many language learners benefit from multi-modal information for a given word or phrase, such as pronunciation, images, videos, and translation to their native language. English-Corpora.org has the only corpora that link to so many types of external resources, in so many useful ways. And when you're looking at Keyword in Context (KWIC) entries for a word, there are a wide range of "one click" resources that help you to kind information on words that you might not already know.

Entire texts (writing)

As with the "alternate phrases" section above, language learners often need help in knowing if Phrase 1 or Phrase 2 or Phrase 3 sounds the most natural, especially in a given genre. At English-Corpora.org, you can enter entire texts that you have written, and then quickly and easily highlight phrases in your text to find related phrases in COCA, which will allow you to edit your writing to make it sound more natural.

Entire texts (reading)

It might be overwhelming for a language learner to look at a text (such as an article from an online newspaper), when there are so many unknown words and phrases in the text. Using COCA at English-Corpora.org, you can find the keywords in a text (to understand better what it's about), but also (perhaps more importantly) click on any word or phrase in the text to see a wide range of information, such as in the "Word Sketches" section above.

Virtual Corpora

Those who are learning English for Specific Purposes (e.g. engineering, finance, or medicine, or even more specific applications like polymers, mortgages, or endocrinology) want to use a corpus to quickly and easily find the words and phrases for these fields. With English-Corpora.org, users can create specialized corpora in 5-10 seconds, and then extract keywords in another 2-3 seconds -- far more quickly and far more easily than can be done with other approaches like BootCat.

Saved words and phrases

When language learners see a useful word or phrase, they want to be able to save that word or phrase, and perhaps assign them to particular categories. This is quick and easy at English-Corpora.org.