SEE TUTORIALS
 
Note: if you click on a link that replaces this help page with another page, just click "back" in your browser to come back to this page.

Creating corpora. You can quickly and easily create "virtual corpora" from Wikipedia, for whatever topics you want. Each of these corpora can contain up to 1,000,000 words in 1,000 articles. Once you have created the corpora, you can then search them in the same way you would any of the other corpora from BYU, including the ability to find keywords. Let's quickly create three corpora.

1. Based on titles of pages. Find pages with the word biology in the title. If you were logged in to your account, you could simply click on [Submit] to save the corpus, which contains 135,000 words in 100 texts. And the corpus doesn't need to be so "academic". If you're interested in basketball, for example, you could easily create an NBA finals corpus (69 texts, 103,000 words).

2. Based on content in pages. Find pages with the word investment somewhere in the text of the article. Again, if you were logged in to your account, you could simply click on [Save List] to save the corpus, which contains 485,000 words in 100 texts. You can also click on the icons to find related pages, such as all pages that link to or are linked from a given page.

3. Combination of title and content. Find pages with Christmas in title and gift* in page text (and also, without the words episode or movie or film in page text).


Editing corpora. Now that you've created the corpora, you can edit them (adding or removing texts, or moving texts from one corpus to another) by clicking on the corpus name. You can also delete a corpus, or temporarily hide (or "ignore" it), or categorize the corpora into user-defined groups (e.g. "Science" or "Sports") by clicking on the appropriate icon.


Keywords. You can also quickly and easily generate keyword lists based on your corpus. For example, you can see NOUN or NOUN+NOUN from the [Biology] corpus, NOUN, ADJ+NOUN, or NOUN+NOUN from the [Investment] corpus, VERB or NOUN from the [NBA finals] corpus, or NOUN from the [Christmas] corpus. (Remember to click "BACK" in your browser to come back to this page, if you click on any of the words in that list.) You can also click on [RELEVANCE] to see words that are more specific just to your corpus, such as ADJ in [Biology], NOUN in [Investments], NOUN in [NBA finals], or NOUN in [Christmas] (and this can be adjusted as well, via the [+] and [-] buttons).


Other searches. Once you've created a "virtual corpus", you can also search within it, just as though it were its own corpus. For example, you can find collocates of cell (nearby words), or see the word cell in context in the [Biology]  corpus, or see ADJ/NOUN + fund or market+NOUN in the [Investment] corpus.

Again, within just a few seconds you've created two "virtual corpora" and then did some pretty powerful searches within these corpora. Hopefully you can begin to see that the possibilities for this corpus are endless.