()   

 

Integrated AI features: free with a premium or academic license.

[ Sample searches | Get started ]

PDF overview

The Movies Corpus was created by Mark Davies, and it contains 200 million words of data in more than 25,000 movies from the 1930s to the current time. All of the 25,000+ movies are tied in to their IMDB entry, which means that you can create Virtual Corpora using extensive metadata -- year, country, rating, genre, plot summary, etc.

The Movie Corpus (along with the TV Corpus) serves as a great resource to look at very informal language -- at least as well as with corpora of actual spoken English. In addition, the Movies Corpus is much larger than any other corpus of informal English (other than other corpora from English-Corpora.org). For example, it is about 20x as large as the conversation portion of the BNC (including their 2014 update).

The corpus also allows you to look at variation over time (1930s-1950s to 1990s-2010s) and variation between dialects (e.g. American and British English). In this sense, the corpus is related to other corpora from English-Corpora.org, which are the most widely used corpora of English and which offer unparalleled insight into variation in English.

Click on any of the links in the search form on the search page for context-sensitive help, and to see the range of queries that the corpus offers.