English-Corpora.org

English-Corpora.org



  Sections   (search form, corpora used, corrections)

Note: click on any link on this page to see the corpus data, and then click on the "BACK" image (see left) at the top of the page to come back to this page. Or right click on the link and then "Open link in new tab" (in Chrome; similar in other browsers), and then close that tab after viewing the corpus data.


Before discussing how sections can be used to sort, limit, and compare the frequency of words, phrases, and collocates in different sections of a corpus, we should first mention the use of the checkbox to the left of Sections in the search form, as shown in the image to the left. When it is not selected, then the frequency is the overall frequency in the entire corpus, as in the images to the left below (for soft NOUN in COCA, or the collocates of GAY in COHA). When it is selected, then you will see the frequency in each section of the corpus (such as genres in COCA, decades in COHA, or countries in GloWbE), as with genres for soft NOUN in COCA and decades for the collocates of GAY in COHA.

In the historical corpora, the ability to see collocates by section (decade) is useful to see "semantic change". For example, with GAY in COHA, notice how the collocates change from bright, flowers, laugh, color in the 1820s-1940s (signifying the older meaning of "cheerful, happy"), whereas from the 1970s on they mainly refer to sexual orientation (lesbian, rights, marriage, etc). Look at similar searches with chip, engine, web, and notice how new collocates appear as the meaning of the word changes (for example, chocolate, potato, and computer with CHIP, search with ENGINE, or site, page, information with WEB. Or it might just refer to changes out in the "real world", such as the decrease in steam and the rise of car or diesel with ENGINE.

SECTIONS is not selected SECTIONS is selected



Sections allow you to sort the results by the frequency in a given section, such as a genre, decade, or country. For example, in COCA look at the frequency of soft NOUN in fiction (soft + voice, light, skin), newspapers (soft + drinks, money, landing), or academic (soft + tissue, power, skills) (scroll down to see the selected sections, if they are not at the top of the dropdown list). In COHA, look at the frequency in the 1820s-1890s (eyes, light, hand, air), 1930s-1950s (coal, drinks, spot), or 1990s-2010s (voice, spot, drinks, money). Similar searches -- whether for individual words (e.g. *ism words), phrases (soft NOUN, must VERB, ADJ wife), or collocates (of cupboard, scheme, family) could be done in any of the other corpora.

(Note that you might need to scroll down in the Section list in the search form to see the selected section, especially with the searches below)

 

For some of the corpora, you can also search by more "fine-grained" sections. For example, in COCA search for ADJ (all adjectives) in Blogs-Argumentative (great, different, big, real, sure), Movies-Romance (sorry, wrong, golden, beautiful), Newspapers-Money (financial, federal, economic, corporate), or Academic-Medicine (clinical, significant, environmental). In COHA, search for NAME (proper noun) collocates of war in 1942-1945 (Germany, Japan, Pacific, Europe, Hitler), 1969-1972 (Vietnam, Viet, Nam, Indochina, Nixon), 2003-2006 (Iraq, Afghanistan, Gulf, Bush, Saddam).

Comparing sections (important information about limits and sorting)

The real power of Sections, though, comes when you compare one set of sections to another. For example, the two lists below show the most frequent adjectives (ADJ) in Academic-Medicine and Academic. Notice that both lists have adjectives like other, high, and different. It might be hard to guess that the Academic-Medicine list is really from that part of the corpus. But when you compare the two lists (i.e. what is in Academic-Medicine a lot more than it is in Academic in general), then the differences become very clear, as is shown in the list on the right.

Note that it is usually best to compare a "sub-genre" (like Academic-Medicine or Newspapers-Money) to the genre that it is part of (like Academic or Newspapers). If you compared Academic-Medicine to Blogs, for example, you wouldn't know if the words in Academic-Medicine are there because they are "medical" words, or if they are just "academic" words in general (compared to blogs). Also, on the Sort/Limit page you can get more information in the different columns in the "compare" table.

Academic-Medicine Academic Compare

The following are other examples of comparisons between two sections, either for individual words or for phrases.

  Words Phrases (including part of speech)
COCA: genres *ment in ACAD (self-management, pretreatment, restatement) vs Fiction (apartment, basement, pavement, amazement)
ADJ in ACAD-Medical (parotid, epidural, ischaemic, histologic) vs ACAD (general)
Synonyms of strong in TV/Movies (spicy, beefy, tough, strapping) vs ACAD (effective, ardent, compelling, deep-seated)
hard NOUN in MAG (hard + cider, freeze, lines, workout) vs ACAD (hard + drug, palate, sciences, currency)
Past tense verb + up in TV/Movies (f*ed, screwed, messed, hooked + up) vs ACAD (summed, scaled, followed + up)
they _vvd in FIC (they + crowded, tumbled, trudged, slid) vs BLOG (they + implemented, endorsed, voted, based)
COHA: decades *heart* in 1820s-1870s (heart-strings, noble-hearted, heartsease, heart-sick) vs 1970s-2010s (heartbeat, heartland, wholeheartedly, halfhartedly)
ADV in 1910s-1930s (Bolshevist, rightist, lend-lease, pleasantest, worth-while) vs 1980s-2010s (digital, supportive, upcoming, ongoing, teenage, racist, multinational, stressful)
VERB + up in 1910s-1930s (bolster, trump, stamp, plough + up) vs 1990s-2010s (f*k, chat, free, boot + up)
ADJ + women in 1870s-1920s (clever, noble, abandoned, refined + women) vs 1970s-2000s (battered, pregnant, African-American, divorced + women)
GloWbE: countries *ies in Australia (brumbies, pollies, schoolies, tradies, pokies, yabbies, boaties, luvvies, vinnies, boardies, bikies, lollies, trackies, swannies, streeties) vs US/UK
*ism
in India (casteism, Shaivism, Gandhism, Vaishnavism, Jainism, Hinduism, Brahmanism) vs "core" (US, UK, etc)
ADJ person in AU/NZ (aboriginal, sponsored, registered, non-indigenous, protected, bonded, incapable, tested) vs US/UK
_vvg down in US/CA (smacking, doubling, quieting, barreling, buckling, scarfing, busting + down) vs UK/IE (bucketing, dusting, popping, attacking, abseiling, damping, crying + down)

Finally, we can compare the collocates in two sections, to see how a word has changed meaning over time, or its different meaning or usage in two genres or countries. For example, in COHA (historical) we can directly compare the collocates of gay in the 1830s-1920s and the 1980s-2010s. In addition, you can compare the collocates in different genres, such as the collocates of chair, chain, string in fiction and academic in COCA (or the same searches in the BNC: chair, chain, string). For example, chain in fiction refers to a literal, physical chain (door, wall, head, neck), while in academic it is more metaphorical and refers to a series of things (commodity, supply, production).

We can also collocates to see differences in the meaning of a word in different dialects. For example, the collocates of scheme are much more negative in the US and Canada (alleged, evil, fraudulent, nefarious) than in the UK, since in the US and Canda scheme refers to the attempt to deceive or swindle someone, whereas in the UK it is more neutral (= "plan"). Likewise, the collocates of cupboards in the US and Canada refers mainly to the kitchen (refrigerator, pantry, plate), whereas in the UK it can refer to other rooms (storage, broom, wardrobe). This would sound strange in the US and Canada, where closet would be used in those contexts. Finally, sometimes the collocates point out interesting differences between cultures, such as the collocates of wife in Asia and Africa compared to the UK, for example (temporary, permanent, senior, junior, favourite // virtuous, chaste, obedient, submissive).