Only those n-grams that occur 40 times or more in a particular set of books are included in the n-grams sets that were downloaded from Google Books. For example, a three or four word string might occur 30 or 35 times in the 155 billion word corpus (for American English), but since it doesn't occur at least 40 times, Google did not include it in the n-grams, we don't have it, and it will never appear in any search results.

To see an example of this, search for into revealing and then click on the 1950s bar to see the examples from the 1950s at Google Books. The second and fourth entries contain the phrases maneuvered them into revealing and trap the husband into revealing. But when you search for either of these phrases by themselves (click on them to see), there are no entries. That's because neither of these four or five word strings occurs at least 40 times, and they are therefore "invisible".

The issue of the "40 token threshold" is important, because it means that although Google Books is much larger than a corpus like the 400 million word Corpus of Historical American English (COHA) and although it will almost always have more tokens (total occurrences), it may have just about the same number of unique strings (types) as COHA, or perhaps even fewer. Consider the following examples:

construction examples Google Books COHA
    tokens types tokens types
[j*] groan heavy/hollow/muffled  groan 56,783 183 869 274
sultry [nn*] sultry  heat/weather/voice 82,419 224 687 236
walked *ly.[r*] walked   quickly/slowly/briskly 446,414 398 5,162 376
started to [vv*] started to run/walk/notice 868,371 907 10,566 1282

This becomes much more of a problem for Google Books for longer strings -- 4-grams and 5-grams -- where there are more total possibilities for each string, and less chance that any given string will occur the required 40 times. Consider the following table:

  construction examples Google Books COHA
      tokens types tokens types
1 it [be] quite [j*] that it is quite strange that 270,237 116 1,436 120
2 [vv*] [p*] into [v?g*] talked her into staying 30,194 234 1,669 1482

In #1, only the [j*] slot has a fairly wide range of possibilities, and so COHA has only a few more types (unique strings) than the American English dataset from Google Books. But in #2, the first and (especially) fourth slots have lots of possibilities, and that's why COHA has about six times as many types, even though it's a much smaller corpus.

Overall, then, Google Books nearly always has many more tokens and since each type occurs at least 40 times, you can be quite sure that they are not typos or other anomalies. On the other hand, the 40 token threshold means that sometimes the results suffer in terms of the number of types.