The current version of the corpus interface is only a prototype -- to serve as a "proof of concept" that the Google Books data can be integrated into roughly the same corpus interface as the other corpora from http://corpus.byu.edu. But only the basic functionality of
the interface for the other corpora is presently available for the Google Books data. Many other features that are already available with the other corpora will hopefully be added soon, including the following:

  • Frequency by year, in addition to frequency by decade

  • Comparing the collocates of two words, e.g. small (amounts, scale) vs. little (while, sister), rob (bank, store) vs. steal (cars, money), or girl (working, sexy) vs. boy (growing, rude)

  • Collocates for more than single words (e.g. break up or rely on), and collocates of lemmas and customized word lists

  • Customized word lists, in which you create your own lists of words (e.g. pieces of clothing, or emotions, or your own synonym set for a given word), and then use these lists directly as part of the query string

  • Grouping results by lemma

  • Working well with all operating systems, browsers, and screen resolutions (e.g. netbooks and iPads)

  • More help files; correcting existing help files

  • Because the Google data only consists of n-grams, true concordancing (sequences of 10-20 words) is not possible. Nevertheless, we can still find the most frequent "strings" for a given word, within a given n-gram. For example, the most common 3-grams with sense are a/the sense of, in the sense, and the sense that; while those with wide are (a) wide range (of), (a) wide variety (of), and far and wide. In the next update, users will be able to find the most frequent strings for any word.