With the standard Google Books interface, you can only do the most basic type of search -- just
the simple frequency of a word or phrase. But such a simple search is just the "tip of the iceberg". So much more could and should be done with such corpora of tens or hundreds of billions of words. With our interface, we can unlock the potential of the Google Books data. The following searches show you some of the things that you can do with our interface. They are all based on just the American English dataset (155 billion words), but you can of course do the same searches in any other dataset -- British, One Million Books, or just Fiction. You can also use the different datasets to compare varieties of English, such as American and British English.


At the most basic level, with our interface to Google Books, you can look for the frequency of individual words like grieved, sublime, and bosom (decrease), steamship, telegraph, or swell as an ADJ (increase then decrease), or teenager, funky, and guys (increase). Or you can look for phrases, like of no little, many a time, or calm down. But notice that in our interface there is actual frequency data that you can copy (it's not just a "picture" of the data like in the standard Google Books interface), so you can paste this in to another application and compare and compute frequencies.

Our interface also allows you to use part of speech information, to look for constructions like beautiful NOUN, ADJ woman, walked ADV-ly, or VERB the way. You can search by all forms of a word (e.g. forms of start or tall). And you can search by synonym (e.g. synonyms of beautiful, walk, or silliness). And combining these, you can look for complex constructions like synonyms of beautiful + a form of woman, or ADJ + synonym of silliness.

If you're interested in syntax, you can look for constructions like [start] to VERB (CHART | TABLE), [end] up VERB-ing (CHART | TABLE), VERB someone into VERB-ing (CHART | TABLE), VERB one's way PREP (e.g. force his way into), and who / whom + did + PRON (e.g. who/whom did you (VERB); see chart showing increase in who). If you just have to look at modals or auxiliary verbs (ever-popular with the "small corpora" crowd) you can look for must VERB, should VERB, ought to VERB, has to VERB, or need to VERB. Note that here we are looking at thousands and tens of thousands of related forms with one single search. With the regular Google Books interface -- where you have to enter the individual phrases one by one by one -- searches like these would take days or weeks or months. Here, we do it in 3-4 seconds.

Our version of the corpus allows you to search by collocates (nearby words), which are much more than just exact strings (like in the regular Google Books interface). So not only can you search for [wear] + a NOUN, or VERB + his laughter, but you can search for a word "near" another one (usually 2-3 words); for example, wore "near" a NOUN, or VERB "near" laughter.

Finally, our version of the corpus allows for powerful comparisons between two time periods. For example, you can find a list of all -ism's that are more common in the 1920s-1940s than in the 1800s, words with -heart- that were more common in the 1800s than now, synonyms of strong that are more common now than 100 years ago, fast + NOUN (e.g. fast food, fast track) now vs. 100 years ago, or adjectives used with food or art or music or women that are used more now than in the 1800s. These are just a handful from among an unlimited number of searches that you can do.


To conclude, Google Books is an incredible tool, and is a hundred or even a thousand times as large as other large corpora of English. But most of this power and potential is "trapped" inside the simple Google Books interface. With our interface, users can take advantage of all of this potential.