Corpora AI / LLMs Overview PDF Videos Resources Help / FAQ My account † corpus.byu.edu

English-Corpora.org

COLLOCATES AND ASSOCIATION MEASURES

There are four help pages that discuss the related topics of association measures, collocates, Mutual Information score, and topics.

Help file	Other sites	English-Corpora.org (E-C)
Association measures	Some other sites have many different association measures, such as MI.log-f, MI, MI3, LogDice, log likelihood, T-score, etc.	E-C has just one association measure (Mutual Information; MI). But we provide many concrete examples that show that raw frequency -- along with MI (but only with MI as a filter) -- actually provides better results than this wide range of "fancy" association measures.
Collocates	Sketch Engine has very good, pre-calculated "word sketches", which contain great information for visualizing the relationship between nearby words.	E-C also has very useful collocates displays -- grouping results by part of speech, showing position of node word / collocates, and allowing users to follow "semantic chains" by browsing from one word/collocate to another. In addition, it allows users to focus in on slight differences in collocational frames (e.g. EAT the NOUN vs EAT NOUN), as well as a wide range of search types involving synonyms (e.g. =clean, =beautiful) and user-defined word lists (e.g. @clothes, @colors).
Topics	Other sites follow the traditional approach of looking at word meaning and usage by looking just at nearby words (collocates).	E-C shows collocates (nearby words), but it also shows words that co-occur anywhere in the text / web page. We provide many examples that show how these topics (related words) flesh out the meaning of a word, in ways that we never would, if we limited ourselves to just a small "cloud of words" around the node word.
Mutual Information	Simply gives some examples of how to calculate Mutual Information, and compares the results from English-Corpora.org to other corpus sites. But again, we argue that raw frequency (with MI only used as a filter) actually produces the best results.

In addition, English-Corpora provides "home pages" for the top 60,000 words in COCA and iWeb, to provide insight into the meaning, usage, and patterns of a word, in ways that collocates alone never could.