HISTORICAL VARIATION (1810s-2000s)
COHA:
400 million words, 1810s-2000s.
100-200 times as large as any other structured
historical corpus of English.
-
Lexical: the frequency of any
word or phrase, e.g.
bestow,
swell (ADJ),
guys,
of no little,
as though to,
freak out
-
Lexical: compare all words in
different time periods, e.g.
*ism words (compare
earlier/later),
*heart*
words (earlier/later)
-
Phraseology:
so ADJ as to V,
BE
but,
HAVE quite V-ed,
a most ADJ NOUN
-
Syntax/grammar: e.g.
end up V-ing,
post-verbal negation with
need,
need to VERB,
sentence initial hopefully,
get
passive
-
Semantics/meaning: use
collocates to see change over time, e.g.
gay (compare
earlier/later),
chip,
engine,
web
-
Discourse/culture: use
collocates to see what we're saying about topics over time:
women (compare
earlier/later),
religion (earlier/later)
|
HISTORICAL VARIATION (recent:
1990-2019)
COCA:
1 billion words, 1990-2019. The only large corpus that keeps the same genre
balance year to year (more...)
-
Lexical: the frequency of any
word or phrase, e.g.
morph,
old-school,
FREAK
out,
(think)
outside the box,
throw someone
under the bus,
BE
likely a|the
-
Lexical: compare all words in
different time periods, e.g. increases from 1990-94 (left) to 2010-2019
(right):
*ism
words,
*gate
words (potentially "scandal"),
*friendly
words (note increase), and phrasal
verbs with up. Note that not every entry is relevant, but it's
a good starting point.
-
Syntax/grammar: e.g. END
up V-ing, GET passive (got
hired), "quotative like"
(he's like, I'm not going), so
not ADJ (I'm so not interested in her)
-
Semantics/meaning: use
collocates to see change over time, e.g.
green,
web,
engine
-
Discourse/culture: changes in
frequency:
blacks,
retarded; use
collocates to see what we're saying about topics over time:
crisis,
terror,
gay
|
HISTORICAL VARIATION (Google
Books)
Google
Books (Advanced): 155 billion words,
1810s-2000s. Much more advanced interface/searches than the
standard Google Books n-grams.
-
Lexical: the frequency of any
word or phrase, e.g.
BESTOW,
a swell
NOUN
(chart),
guys,
of no little,
as though to,
FREAK out
-
Lexical: compare all words in
different time periods, e.g.
*ism words (compare
earlier/later),
*heart*
words (earlier/later)
-
Phraseology:
so ADJ as to
VERB
(table),
[be]
but a NOUN (table),
HAVE quite V-ed,
a most ADJ NOUN
(table)
-
Syntax/grammar: e.g. [end] up VERB-ing
(chart | table),
VERB someone into VERB-ing (chart | table), VERB
one's way PREP (e.g. force his way into), and who / whom + did +
PRON (e.g. who/whom did you (VERB); see chart showing
increase in who). Also,
must
VERB, should
VERB, ought
to VERB, has
to VERB, or need
to VERB.
-
Semantics/meaning: synonyms:
"beautiful" woman,
"clever"
person; collocates show change in meaning, e.g.
gay
(compare
earlier/later)
-
Discourse/culture: changes in
frequency:
negro,
colored person,
blacks,
deaf and dumb,
retarded,
handicapped; use
collocates to see what we're saying about topics over time (1800s vs
1970s-2000s):
fast,
art,
women,
music,
food
|
VARIATION BETWEEN
DIALECTS: compare 20 dialects of World English
GloWbE:
1.9 billion words, 20 different countries. 100 times as large as the next-largest corpus of
English dialects (more...)
-
Lexical: the frequency of any
word or phrase, e.g.
fortnight,
on holiday,
banjax*,
bikkies,
thrice,
eve
teas*,
ACT the maggot,
lah!,
ackee
-
Lexical: compare all words in
different dialects, e.g.
*ism words by dialect ("core"
vs. South Asia),
*ies nouns in Australian
-
Phraseology: e.g.
BE
different to,
rather more ADJ,
take ADJ food,
in over ~ head,
USE ~ head,
MAKE ~
head spin
-
Syntax/grammar:
VERB
likely VERB
(e.g. would likely remember),
like construction,
way construction,
try and VERB,
go +
ADJ,
STOP someone V-ing
-
Semantics/meaning: use
collocates to see differences between dialects, e.g.
scheme (US/CA
= negative),
cupboards (US/CA = mainly kitchen)
-
Discourse/culture: frequency
of words, e.g.
Quran,
Buddh*,
feminism. With collocates, e.g.
ADJ belief (South Asia
vs "core"),
ADJ wife (+/- "core")
|
VARIATION BETWEEN
GENRES: American (COCA)
COCA:
1 billion words, 1990-2019. The largest freely-available,
genre-balanced corpus currently available.
-
Lexical: the frequency of any
word or phrase, e.g. (spoken)
I guess,
, you know
, (fiction)
muffled,
frowned
(academic)
validity,
correlate
-
Lexical: compare all words in
different dialects (give these 10-15 seconds each to run), e.g.
verbs (past
tense) in fiction,
ADJ in academic,
verbs in
religion magazines,
adjectives in medical academic
-
Phraseology: e.g.
. In
particular ,
a lot of,
kind of
NOUN,
type of NOUN;
phrasal verbs
with out (FIC/ACAD)
-
Syntax/grammar: (spoken)
and I'm like
, get
passive,
end up V-ing (fiction)
had been
V-ing, (academic)
be
passive,
appear to VERB,
must +
VERB
-
Semantics/meaning: use
collocates to see differences between dialects, e.g. FIC (left) vs ACAD
(right):
chair,
chain,
string;
synonyms of
strong,
weak
-
Discourse/culture: frequency
of words and phrase, e.g.
global warming,
climate
change,
crippled,
people|person of
color
|
VARIATION BETWEEN
GENRES: British (BNC)
BNC
100 million words, 1980s-1993. Note: somewhat lower counts than COCA,
since the BNC is a much smaller corpus.
-
Lexical: the frequency of any
word or phrase, e.g. (spoken)
I reckon,
, you know
, (fiction)
muffled,
frowned
(academic)
validity,
correlate
-
Lexical: compare all words in
different dialects, e.g.
verbs (past tense)
in fiction,
ADJ in academic,
verbs in sermons,
ADJ in tabloid
news
-
Phraseology: e.g.
. In
particular ,
a lot of,
kind of
NOUN, type
of NOUN;
phrasal verbs with
out (FIC/ACAD)
-
Syntax/grammar: (spoken)
get
passive,
BE V-ing,
(fiction) had
been V-ing, (academic)
be
passive,
appear to V,
HAVE
to VERB,
whom
-
Semantics/meaning: use
collocates to see differences between dialects, e.g. FIC (left) vs ACAD
(right): chair,
chain,
string;
synonyms of
strong, weak
|