Only those n-grams that occur 40 times or more in a particular set of books are included in the n-grams sets that were downloaded from Google Books. For example, a three or four word string might occur 30 or 35 times in the 155 billion word corpus (for American English), but since it doesn't occur at least 40 times, Google did not include it in the n-grams, we don't have it, and it will never appear in any search results.

To see an example of this, search for into revealing and then click on the 1950s bar to see the examples from the 1950s at Google Books. The second and fourth entries contain the phrases maneuvered them into revealing and trap the husband into revealing. But when you search for either of these phrases by themselves (click on them to see), there are no entries. That's because neither of these four or five word strings occurs at least 40 times, and they are therefore "invisible".

The issue of the "40 token threshold" is important, because it means that although Google Books is much larger than a corpus like the 400 million word Corpus of Historical American English (COHA) and although it will almost always have more tokens (total occurrences), it may have just about the same number of unique strings (types) as COHA, or perhaps even fewer. Consider the following examples:

construction	examples	Google Books		COHA
		tokens	types	tokens	types
[j] groan*	heavy/hollow/muffled groan	56,783	183	869	274
sultry [nn*]	sultry heat/weather/voice	82,419	224	687	236
walked ly.[r]	walked quickly/slowly/briskly	446,414	398	5,162	376
started to [vv*]	started to run/walk/notice	868,371	907	10,566	1282

This becomes much more of a problem for Google Books for longer strings -- 4-grams and 5-grams -- where there are more total possibilities for each string, and less chance that any given string will occur the required 40 times. Consider the following table:

	construction	examples	Google Books		COHA
			tokens	types	tokens	types
1	it [be] quite [j*] that	it is quite strange that	270,237	116	1,436	120
2	*[vv]** [p] into [v?g]**	talked her into staying	30,194	234	1,669	1482

In #1, only the [j*] slot has a fairly wide range of possibilities, and so COHA has only a few more types (unique strings) than the American English dataset from Google Books. But in #2, the first and (especially) fourth slots have lots of possibilities, and that's why COHA has about six times as many types, even though it's a much smaller corpus.

Overall, then, Google Books nearly always has many more tokens and since each type occurs at least 40 times, you can be quite sure that they are not typos or other anomalies. On the other hand, the 40 token threshold means that sometimes the results suffer in terms of the number of types.