The current version of the corpus interface is only a prototype -- to
serve as a "proof of concept" that the Google Books data can be integrated
into roughly the same corpus interface as the other corpora from
http://corpus.byu.edu. But only the
basic functionality of the interface for the other corpora is
presently available for the Google Books data. Many other features that
are already available with the other corpora will hopefully be added soon, including the following:
-
Frequency by year, in
addition to frequency by decade
-
Comparing the collocates
of two words, e.g. small (amounts, scale) vs.
little (while, sister), rob (bank, store)
vs. steal (cars, money), or girl
(working, sexy) vs. boy (growing,
rude)
-
Collocates for more than
single words (e.g. break up or rely on), and
collocates of lemmas and customized word lists
-
Customized word lists, in
which you create your own lists of words (e.g. pieces of clothing, or
emotions, or your own synonym set for a given word), and then use
these lists directly as part of the query string
-
Grouping results by lemma
-
Working well with all
operating systems, browsers, and screen resolutions (e.g. netbooks and
iPads)
-
More help files;
correcting existing help files
-
Because the Google data
only consists of n-grams, true concordancing (sequences of 10-20
words) is not possible. Nevertheless, we can still find the most
frequent "strings" for a given word, within a given n-gram. For
example, the most common 3-grams with sense are a/the
sense of, in the sense, and the sense that; while those
with wide are (a) wide range (of), (a) wide variety (of),
and far and wide. In the next update, users will be able to
find the most frequent strings for any word.
|