English Corpora: most widely used online corpora. Billions of words of data: free online access

In most cases, the examples in these linked pages comes from the Corpus of Contemporary American English (COCA), since it is the most widely used of the corpora from English-Corpora.org (and probably the most widely-used online corpus anywhere). A number of examples also come from COHA (historical), GloWbE (dialects), and NOW (very large and recent). But all of the information in these help files should be applicable to any of the 17 corpora at English-Corpora.org. (close)

Please note that these pages were recently released (in September 2024), and there are probably still some errors, since English-Corpora.org has been created and is run by just one person. If you find anything that needs to be corrected, please email us. Thanks. (close)

Keyword in Context (KWIC, and also known as concordance) searches allow you to see the patterns in which a word occurs. Word/phrase searches allow you to search for specific strings of words. Collocates searches allow you to see the nearby words in a "cloud of words" around a given word. But KWIC searches allow you to see the patterns in which a word occurs, in ways that can't be done with the other two types of searches.

Well, this is mostly true. Let's take the example of fathom, which means "to understand". Suppose we want to know the patterns in which this word occurs. We could search for * * fathom to see the two word strings that occur to the left of fathom, and we would see that they are negative, or that they at least mention that it is hard or difficult to fathom:

But should we search one word to the left, or two words, or three? Should we sort by words to the left or right of the "node word" (in this case fathom) to find interesting patterns? And how could we see phrases that occur in the context to the right or the left? This is exactly what KWIC searches do the best.

For example, look at the following partial entries from the KWIC lines for fathom. (Note that the saved search linked to above will replicate the search, but not the exact KWIC lines. You will see similar KWIC lines, but probably not the exact same ones as in these images.)

We can easily scan through these lines and see that the preceding words -- whether one, two, three words (or more) -- all express the "negative / questioning" sense accompanies the word fathom. (This is a good example of semantic prosody.)

And look at these KWIC lines for naked eye. We can quickly scan through the entries to find repeated phrases, like visible/invisible to the naked eye or see X with the naked eye. It would be much more difficult to find patterns like this by just looking for exact four or five words strings, or with collocates (which just find single words somewhere nearby another word).

Consider these lines with gone the way of the, which refers to things that have gone extinct or disappeared (and this is just a small portion of the results). We can quickly scan through the entries and see phrases like dinosaurs, dodo (birds), rotary phones, side rule (calculator), and typewriters. Again, it would be difficult to find phrases like this with just phrase searches like gone the way of the * * or by using collocates (which will just find single words, not phrases).

Finally, consider the phrase END up. Because the words (three words to the left and three words to the right of our search word) are color coded, we can quickly scan through the entries to see what parts of speech occur near our search word (in corpus linguistics, this is referred to as colligation). We see that with end up, there are many -ing forms of verbs (in pink), a few adjectives (in green), and prepositional phrases with the word with (in yellow). Patterns of co-occurring words and phrases and parts of speech almost "jump out" at us with these KWIC searches.

As with word/phrase and collocates searches, you can also limit the KWIC search by section in the corpus. For example, the following are entries for END up in COHA (the Corpus of Historical American English) in the 1820s-1890s and then in the 1980s-2010s. Notice how there are no cases of VERB-ing in the 1800s (i.e. no words in pink after END up), but those are the majority of the entries in the 1980s-2010s.