[Davies/BYU] Corpus of Contemporary American English

This new interface for Google Books allows you to search more than 200 billion words (200,000,000,000) of data in both the American and British English datasets, as well as the One Million Books and Fiction datasets. (If you're interested just in contemporary English, there are still nearly 100 billion words from just 1980-2009).

Although this "corpus" is based on Google Books data, it is not an official product of Google or Google Books (citation). Rather it was created by Mark Davies, Professor of Linguistics at Brigham Young University, and it is related to other large corpora that we have created.

This interface allows you to search the Google Books data in many ways that are much more advanced than what is possible with the simple Google Books interface. You can search by word, phrase, substring, lemma, part of speech, synonyms, and collocates (nearby words). You can copy the data to other applications for further analysis, which you can't do with the regular Google Books interface. And you can quickly and easily compare the data in two different sections of the corpus (for example, adjectives describing women or art or music in the 1960s-2000s vs the 1870s-1910s). Note however that what you see here is still an early version of the corpus (interface), and new features will be added and corrections will be made over the coming months.

Please feel free to take a five minute guided tour (based on the American English dataset), which will show you the major features of the corpus. A simple click for each query will automatically fill in the form for you, display the results from the 155 billion words of text from American English, and then provide links to the actual books at Google Books.

Compare the Corpus of Contemporary American English to the American National Corpus time corpus American English wordlists word lists frequency BYU Mark Davies