English-Corpora.org


Three of the most powerful corpus architectures and interfaces for large online corpora are the Corpus Workbench approach used by Sketch Engine and BNCweb / CQPWeb, as well as the architecture that we use for the corpora from English-Corpora.org. The table below summarizes some of their features. You might also be interested in a discussion of the speed of the corpus architecture.

There are a few features in our corpus architecture that are not in the other two architectures, but the converse is also undoubtedly true. The bottom line, however, is that any one of these three architectures should work fine for large, heavily-annotated online corpora.

We want to be fair to all three architectures, so if there is incorrect information for Sketch Engine or CQP/BNCweb, or if you are aware of another architecture that allows most of these features (at least basic queries, collocates, and limiting searches by a section of the corpus), please let us know.
 

Feature

E-C (click)1 Sketch Engine CQP/BNCweb
Basic queries
   word
   phrase
   wildcard
   lemma
   part of speech
   combine any of above

Y
Y
Y
Y
Y
Y

Y
Y
Y
Y
Y
Y

Y
Y
Y
Y
Y
Y

Visualization
   frequency of each matching string

   frequency of each matching string, in each of several sections

   overall frequency for all matching forms, in different sections


Y
Y
Y

Y
N
N

Y
N
N
Collocates
   basic collocates search
   sort by Mutual Information
   limit collocate by part of speech
   find specific collocate(s) near node word(s)

Y
Y
Y
Y

Y
Y
Y
Y

Y
Y
Y
Y

Feature

E-C (click) Sketch Engine CQP/BNCweb
Word comparisons
   basic (e.g. collocates of small vs. little, or men and women)

Y

Y

N

Integrated synonyms
   basic: search by synonyms
   advanced: include synonyms as part of another query
   see frequency of synonyms in different sections (e.g. by genre or over time)
   compare frequency of synonyms in different sections

   see all collocates for a much larger list of words (e.g. all synonyms of large)
   "synonym chains": explore web of related words (click on [S] in the entries)


Y
Y
Y
Y
Y
Y

N
N
N
N
N
N

N
N
N
N
N
N
Customized / personalized lists
   create lists of words and re-use them as part of query syntax

Y

N

N
Limiting by sections of corpus
   basic (e.g. collocates of strong in academic journals)
   compare frequencies in different sections (e.g. ADJ in ACAD-Medicine vs ACAD)
   compare collocates in different sections (e.g. chair in spoken vs. academic)

Y
Y
Y

Y
N
N

Y
N
N