Note: click RETURN in the upper
right-hand corner to return to this page, after clicking on any of the
links below.
Note: these help files are for the older
interface, which was used before May 2016. If you are using the newer
interface, the layout will be slightly different, but the functionality
is the same. |
The
Wikipedia corpus from English-Corpora.org, which was
released in early 2015, contains 1.9 billion words in 4.4 million web pages,
and you can search the entire corpus with the same
type of queries as the other corpora from English-Corpora.org.
More importantly, though, you can
also quickly and easily create "virtual" corpora "on the fly" for any
topic that you want, such as:
biology,
investments,
Buddhism,
psychology,
cars,
basketball.
The topics can be as narrow as you want, including maybe just 5-10 different
Wikipedia pages.
Once you have created these corpora
via the web interface, you can then quickly and easily search in the corpora.
First, you can find keywords, such as nouns in:
biology,
investments,
Buddhism,
psychology,
cars, or
basketball
(overall frequency)
biology,
investments,
Buddhism,
psychology,
cars, or
basketball
(more specific words for these corpora)
Of course, you can search for other words too, for example, such as
verbs in
Buddhism, adjectives in
biology, or
noun+noun in
investments.
In addition to finding keywords,
you can also search within your virtual corpora, such as matching words (e.g.
financ*),
strings of words (e.g.
market + NOUN), collocates (e.g. of
market),
and concordance lines (e.g. for
market). (All of these examples are
from the investments corpus, but you can obviously do searches for any corpus
you create.)
There are a number of tutorials
for the corpus on YouTube (*= alternate site, if YouTube is
not accessible in your country)
General topic |
Length |
Individual topics |
Overview
* |
8:59 |
- Creating virtual corpora
- Finding keywords in your corpora
- Basic searches in your corpus (frequency, strings, collocates,
concordances)
- Editing and managing your corpora |
Creating virtual
corpora: basic
* |
2:57 |
- Creating corpora by word or phrase in the Wikipedia
article
- Creating corpora by the title of the Wikipedia article |
Finding keywords
in your corpus
* |
4:55 |
- Frequency listing of corpus by part of speech (noun,
verb, adjective, adverb)
- Frequency listing by multi-word expression (Noun+Noun, Adj+Noun)
- Finding words that are more specific to your corpus |
Searching within
your corpus
* |
5:58 |
- Frequency listings (substrings)
- String search, e.g. market + NOUN
- Collocates (nearby words); useful insight into meaning and usage of
word
- Concordance lines (re-sortable); see the patterns in which a word
occurs |
Comparing across
corpora * |
4:33 |
- Finding the frequency in the different corpora that
you've created
- Example: the frequency of words for "obedience" in different religions
- Example: the frequency of the word gods in different religions
- Comparing concordance lines, e.g. stress in engineering and
psychology |
Managing your
corpora * |
3:04 |
- Deleting your virtual corpora
- "Hiding" or "ignoring" corpora (without completely deleting them)
- Renaming corpora
- Grouping virtual corpora by topic (e.g. science or finance) |
Editing your
corpora * |
7:24 |
- Deleting individual pages from a corpus
- Deleting pages from your corpus from concordance lines
- Moving pages from one corpus to another
- Adding pages from one corpus to another
- Searching for words and then adding multiple pages to an existing
corpus |
Creating virtual
corpora: advanced
* |
6:53 |
- Comparison of searching by words in text and
searching by title
- When searching by title is better than searching by words in text
- When searching by title (alone) may not be enough
- By title: adding words that are not in the title
- By title: adding words that are or are not in the next of the page |
|