COLLOCATES AND ASSOCIATION MEASURES
There are four help pages that
discuss the related topics of association measures, collocates, Mutual
Information score, and topics.
Help file |
Other sites |
English-Corpora.org (E-C) |
Association measures
|
Some other sites have many different association measures,
such as MI.log-f,
MI, MI3, LogDice, log likelihood, T-score, etc. |
E-C has just one association measure (Mutual
Information; MI). But we provide many concrete examples that show that
raw frequency -- along with MI (but only with MI as a filter) --
actually provides better results than this wide range of "fancy"
association measures. |
Collocates |
Sketch Engine has very good, pre-calculated "word sketches",
which contain great information for visualizing the relationship between
nearby words. |
E-C also has very useful collocates displays --
grouping results by part of speech, showing position of node word /
collocates, and allowing users to follow "semantic chains" by browsing
from one word/collocate to another. In addition, it allows users to
focus in on slight differences in collocational frames (e.g. EAT the
NOUN vs EAT NOUN), as well as a wide range of search types involving
synonyms (e.g. =clean, =beautiful) and user-defined word lists (e.g.
@clothes, @colors). |
Topics |
Other sites follow the traditional approach of looking at word
meaning and usage by looking just at nearby words (collocates). |
E-C shows collocates (nearby words), but it also shows
words that co-occur anywhere in the text / web page. We provide
many examples that show how these topics (related words) flesh out the
meaning of a word, in ways that we never would, if we limited ourselves
to just a small "cloud of words" around the node word. |
Mutual Information |
Simply gives some examples of how to
calculate Mutual Information, and compares the results from English-Corpora.org
to other corpus sites.
But again, we argue that raw frequency (with MI
only used as a filter) actually produces the best results. |
In addition, English-Corpora provides "home
pages" for the top 60,000 words in COCA and iWeb, to provide insight into
the meaning, usage, and patterns of a word, in ways that collocates alone never
could.
|