Sections (search
form,
corpora used,
corrections)
|
Note: click on any link
on this page to see the corpus data, and then
click on the "BACK" image (see left) at the top of the page to come back to
this page. Or right click on the link and then "Open link in new tab" (in
Chrome; similar in other browsers), and then close that tab after
viewing the corpus data. |
In most cases, the examples in
these linked pages comes from the Corpus of Contemporary American English
( COCA), since it is the most widely used of the corpora from English-Corpora.org
(and probably the most widely-used online corpus anywhere).
A number of examples also come from
COHA (historical),
GloWbE (dialects), and
NOW (very large and recent). But all of the information in these help files should
be applicable to any of the 17 corpora at English-Corpora.org.
(close)
Please note that these pages were recently released (in September
2024), and there are probably still some errors, since English-Corpora.org has
been created and is run by just one person. If you find anything that needs to be corrected, please
email us. Thanks.
(close)
|
Before discussing how sections can be
used to sort, limit, and compare the frequency of words, phrases,
and collocates in different sections of a corpus, we should first
mention the use of the checkbox to the left of
Sections in the
search form, as shown in the image to the left. When it is not
selected, then the frequency is the overall frequency in the entire
corpus, as in the images to the left below (for
soft NOUN in COCA, or
the collocates of GAY in COHA). When it is selected, then
you will see the frequency in each section of the corpus (such as
genres in COCA, decades in COHA, or countries in GloWbE), as with
genres for
soft NOUN in COCA and decades for the
collocates of GAY in COHA.
In the historical corpora, the ability to see collocates by
section (decade) is useful to see "semantic change". For example,
with GAY in COHA, notice how the collocates change from bright,
flowers, laugh, color in the 1820s-1940s (signifying the older
meaning of "cheerful, happy"), whereas from the 1970s on they mainly
refer to sexual orientation (lesbian, rights, marriage, etc). Look at similar
searches with
chip,
engine,
web,
and notice how new collocates appear as the meaning of the word changes (for
example, chocolate, potato, and computer with CHIP, search
with ENGINE, or site, page, information with WEB. Or it might just
refer to changes out in the "real world", such as the decrease in steam
and the rise of car or diesel with ENGINE. |
SECTIONS is not selected |
SECTIONS is selected |
|
|
|
|
|
Sections allow you to sort the results
by the frequency in a given section, such as a genre, decade, or
country. For example, in COCA look at
the frequency of soft NOUN in
fiction (soft
+ voice, light, skin),
newspapers (soft + drinks,
money, landing), or
academic (soft + tissue, power,
skills) (scroll down to see the selected sections, if they are
not at the top of the dropdown list). In COHA, look at the frequency
in the
1820s-1890s (eyes, light, hand, air),
1930s-1950s (coal, drinks, spot), or
1990s-2010s (voice,
spot, drinks, money). Similar searches -- whether for individual
words (e.g. *ism words), phrases (soft NOUN, must VERB,
ADJ wife), or collocates (of cupboard, scheme, family)
could be done in any of the other corpora.
(Note that you might need to
scroll down in the Section list in the search form to see the
selected section, especially with the searches below)
|
|
For some of the corpora, you can also search by more
"fine-grained" sections. For example, in COCA search for ADJ
(all adjectives) in
Blogs-Argumentative (great, different, big, real, sure),
Movies-Romance (sorry, wrong, golden, beautiful),
Newspapers-Money (financial, federal, economic, corporate),
or
Academic-Medicine (clinical, significant, environmental).
In COHA, search for NAME (proper noun) collocates of war in
1942-1945 (Germany, Japan, Pacific, Europe, Hitler),
1969-1972 (Vietnam, Viet, Nam, Indochina, Nixon),
2003-2006 (Iraq, Afghanistan, Gulf, Bush, Saddam). |
Comparing sections (important information about
limits and sorting)
The real power of Sections, though, comes when you compare one set of
sections to another. For example, the two lists below show the most frequent
adjectives (ADJ) in
Academic-Medicine and
Academic.
Notice that both lists have adjectives like other, high, and different.
It might be hard to guess that the Academic-Medicine list is really from that
part of the corpus. But when you compare the two lists (i.e.
what is in
Academic-Medicine a lot more than it is in Academic in general), then the
differences become very clear, as is shown in the list on the right.
Note that it is usually best to compare a "sub-genre" (like Academic-Medicine
or Newspapers-Money) to the genre that it is part of (like Academic or
Newspapers). If you compared Academic-Medicine to Blogs, for example, you
wouldn't know if the words in Academic-Medicine are there because they are
"medical" words, or if they are just "academic" words in general (compared to
blogs). Also, on the Sort/Limit page you
can get more information in the different columns in the "compare" table.
The following are other examples of comparisons between two sections, either
for individual words or for phrases.
|
Words |
Phrases (including part of
speech) |
COCA: genres |
*ment
in ACAD (self-management, pretreatment, restatement)
vs Fiction (apartment, basement, pavement, amazement)
ADJ in ACAD-Medical (parotid, epidural, ischaemic,
histologic) vs ACAD (general)
Synonyms of
strong in TV/Movies (spicy, beefy, tough,
strapping) vs ACAD (effective, ardent, compelling,
deep-seated) |
hard NOUN in MAG (hard + cider, freeze, lines,
workout) vs ACAD (hard + drug, palate, sciences, currency)
Past tense verb + up in TV/Movies (f*ed,
screwed, messed, hooked + up) vs ACAD (summed, scaled,
followed + up)
they _vvd in FIC (they + crowded, tumbled,
trudged, slid) vs BLOG (they + implemented, endorsed,
voted, based) |
COHA: decades |
*heart* in 1820s-1870s (heart-strings, noble-hearted,
heartsease, heart-sick) vs 1970s-2010s (heartbeat,
heartland, wholeheartedly, halfhartedly)
ADV in 1910s-1930s (Bolshevist, rightist, lend-lease,
pleasantest, worth-while)
vs 1980s-2010s (digital, supportive, upcoming, ongoing,
teenage, racist, multinational, stressful) |
VERB
+ up in 1910s-1930s (bolster, trump, stamp,
plough + up) vs 1990s-2010s (f*k, chat, free, boot + up)
ADJ + women in 1870s-1920s (clever, noble,
abandoned, refined + women) vs 1970s-2000s (battered, pregnant,
African-American, divorced + women) |
GloWbE: countries |
*ies
in Australia (brumbies, pollies, schoolies, tradies, pokies,
yabbies, boaties, luvvies, vinnies, boardies, bikies, lollies, trackies,
swannies, streeties) vs US/UK
*ism in India (casteism, Shaivism, Gandhism,
Vaishnavism, Jainism, Hinduism,
Brahmanism) vs "core" (US, UK, etc) |
ADJ person in AU/NZ (aboriginal, sponsored,
registered, non-indigenous, protected, bonded, incapable, tested) vs
US/UK
_vvg
down in US/CA (smacking, doubling, quieting,
barreling, buckling, scarfing, busting + down) vs UK/IE (bucketing,
dusting, popping, attacking, abseiling, damping, crying + down) |
Finally, we can compare the collocates in two
sections, to see how a word has changed meaning over time, or its different
meaning or usage in two genres or countries. For example, in COHA (historical) we can directly
compare
the collocates of gay in the 1830s-1920s and the 1980s-2010s.
In addition, you can compare the collocates in different genres, such as the
collocates of
chair,
chain,
string
in fiction and academic in COCA (or the same searches in the BNC:
chair,
chain,
string).
For example, chain in fiction refers to a literal, physical chain
(door, wall, head, neck), while in academic it is more metaphorical and
refers to a series of things (commodity, supply, production).
We can also collocates to see differences in the meaning of a word in
different dialects. For example, the collocates of
scheme
are much more negative in the US and Canada (alleged, evil, fraudulent,
nefarious) than in the UK, since in the US and Canda scheme refers to the
attempt to deceive or swindle someone, whereas in the UK it is more neutral (= "plan").
Likewise, the collocates of
cupboards
in the US and Canada refers mainly to the kitchen (refrigerator, pantry,
plate), whereas in the UK it can refer to other rooms (storage, broom,
wardrobe). This would sound strange in the US and Canada, where closet
would
be used in those contexts. Finally, sometimes the collocates point out
interesting differences between cultures, such as the
collocates of wife in Asia and Africa compared to the UK, for example
(temporary, permanent, senior, junior, favourite // virtuous, chaste,
obedient, submissive).
|