Most corpora want to show what's going on with the informal, more "spoken" variety of a language, as opposed to (or at least in addition to) more formal fiction, newspapers, magazines, or academic writing. This is hard to do, however, since it is very time-consuming and expensive to create a large corpus of the spoken language, because of the effort in recording, transcribing, and then annotating the texts. As a result, spoken corpora tend to be quite small. For English, for example, the MICASE, CALLHOME and CALLFRIEND corpora are all between about 1 and 2 million words. This might be adequate for extremely high frequency phenomena (e.g. modals and other auxiliary verbs), but it is far too small to look carefully at medium and lower-frequency words, for example The British National Corpus (BNC interface) is perhaps the only corpus that has a large amount of everyday conversation -- about 5 million words of text (plus 5 million more in the partially available 2014 update). But the BNC is almost a "once-off" type of corpus, since large institutional funding (e.g. millions of dollars from Oxford University Press) and staffing (a large number of people in the corpus creation team) isn't something that most corpora can tap into. In addition, even though the conversational portion of the BNC corpus is now 10 million words (with the 2014 update), that is still about 12 times as small as the TV/Movies data in COCA.
Our TV and Movies data is based on texts / data that is very similar to SUBTLEXus. The Movies Corpus, however, allows you to do more than just search for the frequency of a specific word in a wordlist. As with all of the corpora froom English-Corpora.org, the TV Corpus allows you to:
In summary, the overall value of the TV and Movies corpora (in terms of very informal language) is justified by previous research (see above). But the TV Corpus provides a "full-corpus", rather than just a word list (as with SUBTLEXus). And finally, the TV Corpus is much larger than small spoken corpora like the BNC -- much more data, and yet equally as informal.
|