Three of the most powerful corpus architectures and interfaces for large
online corpora are the
Corpus
Workbench approach used by
Sketch
Engine and
BNCweb / CQPWeb, as well as the architecture that we use for the
corpora from English-Corpora.org.
The table below summarizes some of their features. You might also be
interested in a discussion of the speed of the corpus
architecture.
There are a few features in our corpus architecture that are not in the
other two architectures, but the converse is also undoubtedly true. The bottom line,
however, is that any one of
these three architectures should work fine for large, heavily-annotated online corpora.
We
want to be fair to all three architectures, so if there is incorrect
information for Sketch Engine or CQP/BNCweb, or if you are aware of another
architecture that allows most of these features (at least basic queries,
collocates, and limiting searches by a section of the corpus), please
let us know.
Feature |
E-C
(click)1 |
Sketch Engine |
CQP/BNCweb |
Basic queries
word
phrase
wildcard
lemma
part of speech
combine any of above |
Y
Y
Y
Y
Y
Y |
Y
Y
Y
Y
Y
Y |
Y
Y
Y
Y
Y
Y |
Visualization
frequency of each matching string
frequency of each matching string, in each of several sections
overall frequency for all matching forms, in different sections |
Y
Y
Y |
Y
N
N |
Y
N
N |
Collocates
basic collocates search
sort by Mutual Information
limit collocate by part of speech
find specific collocate(s) near node word(s) |
Y
Y
Y
Y |
Y
Y
Y
Y |
Y
Y
Y
Y |
Feature |
E-C
(click) |
Sketch Engine |
CQP/BNCweb |
Word comparisons
basic (e.g. collocates of small vs. little, or men
and women) |
Y |
Y |
N |
Integrated synonyms
basic: search by synonyms
advanced: include synonyms as part of another query
see frequency of synonyms in different sections (e.g. by genre or
over time)
compare frequency of synonyms in different sections
see all collocates for a much larger list of words (e.g. all
synonyms of large)
"synonym chains": explore web of related words (click on
[S] in the entries) |
Y
Y
Y
Y
Y
Y |
N
N
N
N
N
N |
N
N
N
N
N
N |
Customized / personalized lists
create lists of words and re-use them as part of query syntax |
Y |
N |
N |
Limiting by sections of corpus
basic (e.g. collocates of strong in
academic journals)
compare frequencies in different sections (e.g. ADJ in
ACAD-Medicine vs ACAD)
compare collocates in different sections (e.g. chair in
spoken vs. academic) |
Y
Y
Y |
Y
N
N |
Y
N
N |
|