()   

 

Integrated AI features: free with a premium or academic license.

[ Sample searches | Get started ]

The Wikipedia Corpus was created by Mark Davies. It contains the full text of Wikipedia, and it contains 1.9 billion words in more than 4.4 million articles.

But this corpus allows you to search Wikipedia in a much more powerful way than is possible with the standard interface. You can search by word, phrase, part of speech, and synonyms. You can also find collocates (nearby words), and see re-sortable concordance lines for any word or phrase.

Click on any of the links in the search form on the search page for context-sensitive help, and to see the range of queries that the corpus offers.

Most importantly, you can create and use virtual corpora from any of the 4,400,000 articles in the corpus. For example, in less than a minute you could create a corpus with 500-1,000 pages (perhaps 500,000-1,000,000 words) related to microbiology, economics, basketball, Buddhism, or thousands of other topics. (More information, with YouTube videos)

You can then search within that virtual corpus, compare the frequency of a word, phrase or grammatical construction in your different virtual corpora, and also create "keyword lists" based on the texts in your virtual corpus.

Rather than spending hours or days to create a specialized corpus for a particular topic, you can create it and search it within just a minute or two with this Wikipedia corpus.

Finally, the corpus is related to other corpora from English-Corpora.org, which are the most widely used corpora of English and which offer unparalleled insight into variation in English.