The following links provide a good overview of the features of the corpus.  Each link inputs values into the search interface and runs the query against the corpus (i.e. these are not "canned" results). You might want to note which options have been selected in the form, and then modify the values to create your own queries.

For many more examples and for a description of the different options, click on any element in the search form (e.g. COLLOCATES, SECTIONS).


Using the web interface, you can search by words (mysterious), phrases (I * that or faint + noun), lemmas (all forms of words, like sing or tall), wildcards (un*ly or r?n*), and more complex searches such as un-X-ed adjectives or verb + any word + a form of ground.

As the preceding searches indicate, the first option in the search form allows you to either see a list of all matching strings, or a chart display that shows the frequency in the five "macro" registers (spoken, fiction, popular magazines, newspapers, and academic journals). Look for the frequency of I reckon, muffled, validity, or forms of need + to + VERB. Via the chart display, you can also see the frequency of the word or phrase in subregisters as well, such as classroom lectures, children's fiction, finance magazines, or medical journals. With the list display, you can also see the frequency of each matching string in each of the major sections of the corpus (look for deep + noun, with and without the totals for each section).

You can also search for collocates (words nearby a given word), which often provides insight into the meaning of a given word.  For example, you can search for the most common nouns near thick, adjectives near smile (or sorted by relevance), nouns after look into, or words starting with clos* near eyes.

You can also include information about genre or a specific time period directly as part of the query.  This allows you to see how words and phrases vary across speech and many different types of written texts.  We can easily find which words and phrases occur much more frequently in one register than another, such as good + [noun] in fiction, or verbs in the slot [we VERB that] in academic writing. You can also apply this to collocates, such as nouns with the verb break in NEWS or adjectives with woman in FICTION. Finally, you can compare one section to another, such as nouns near chain in (ACAD vs FICTION), nouns with passionate (FICTION vs NEWSPAPER), adjectives in tabloid newspapers compared to other magazines, or adjectives in medical journals compared to other journals.

Finally, you can easily carry out semantically-oriented searches.  For example, you can compare nouns that appear with small and little, adjectives with men and women, or nouns with utter and sheer.  You can also find the frequency and distribution of synonyms of a given word, such as beautiful or the verb clean, see which synonyms are more frequent in competing registers (such as synonyms of strong in FICTION and ACADEMIC), and use synonyms as part of a more complex query (such as synonyms of clean + NOUN). Finally, you can create "customized lists" for any category that interests you, and then re-use these in subsequent queries (such as colors + clothes, or words related to beautiful + woman).

Hopefully this short five minute overview of the corpus has been helpful.  Now feel free to look at more examples of the types of possible searches, by clicking on any of the form elements (e.g. COLLOCATES or SECTIONS) in the search form.