This entire corpus is based on the n-grams that are provided by Google Books (detailed information; copyright). When you search the "corpus" at our site, you are actually searching these n-grams, rather than the actual Google Books (sentences, paragraphs, and pages of text). However, the frequency lists that you will see here contain links to Google Books, to see the actual occurrences in the texts. There are five important things to note about these n-grams: 1. Creating the
corpus: Overview of the original files and
how they were processed for this
corpus.
But as you can see from
the links to the actual Google Books and the charts, these don't
correspond well with each other (and some, like #1, don't have
results in the books themselves). Since our n-grams data is the same
as those used for the charts, it may be problematic as well for
those strings that have punctuation. In other words, the n-grams
will appear in our results like, but they won't actually find
anything in the books at Google Books. This is a problem with Google
Books and their data -- not our interface. |
||||||||||||