English-Corpora.org


SPEED

For very large corpora, Sketch Engine is just about the fastest corpus architecture available. Our architecture, however, is even faster -- about six times as fast, on average, for "string searches" like those shown below. This means that with GloWbe, for example, you might spend 5 minutes doing a series of searches, whereas it would take you 30 minutes total (25 minutes more waiting for results) in a similar-sized corpus in Sketch Engine.

The following data is based on the 1.9 billion GloWbE corpus and a 2.7 billion word corpus in Sketch Engine [enTenTen08 = 3.3 billion tokens, including punctuation, etc). Since [enTenTen08] is about 50% larger (2.7 vs 1.9 billion words), it should take about 50% longer for each search. But in fact, it takes much longer than that. For example, the first search shown below -- [have] quite [vvn*] -- takes about 2.6 seconds in GloWbE. Allowing for the 50% larger size of [enTenTen08], it should take about 3.9 seconds there. In fact, though, it takes about 25 seconds (11 seconds for the concordance lines (SE1) + 14 seconds to find and sort the node words (SE2)), and this is about 6-7 times as slow as GloWbE.

Note: click on any link on this page to see the corpus data, and then click on the "BACK" image (see left) at the top of the page to come back to this page.
 
GloWbE Sketch Engine (enTenTen08) GloWbE SE1 SE2 Faster (x)
[have] quite [vvn*] [lemma = "have"] [word = "quite"] [tag = "VVN"]  2.6 11 14 6.4
several [nn*] [word = "several"] [tag = "NN."]  3.3 12 75 17.6
I [vv*] if [word = "I"] [tag = "VV."] [word = "if"]  5.7 24 29 6.2
just [vv*] [p*] [vv*] that [word = "just"] [tag = "VV."] [tag = "PP$"] [tag = "VV."] [word = "that"] 5.5 36 5 5.0
[j*] places [tag = "AJ"] [word = "places"]  3.6 14 31 8.3
in no [nn*] [word = "in"] [word = "no"] [tag = "NN."] 4.9 14 7 2.9
to only [v*] [word = "to"] [word = "only"] [tag = "VV."] 5.0 21 5 3.5
[vv*] [p*] into [v?g*] [tag = "VV."] [tag = "PP"] [word = "into"] [tag = "V.G"] 5.0 30 8 5.1
[r*] [vv*] whether [tag = "RB"] [tag = "VV."] [word = "whether"]  3.0 26 14 8.9
[go] [j*] [lemma = "go"] [tag = "JJ"]  6.8 14 28 4.1
     

Average

6.8 x

iWeb word list