English-Corpora.org

English-Corpora.org



  Word / phrase searches   (search form, corpora used, corrections, +/- sections, compare to Chart, Collocates searches)

Note: click on any link on this page to see the corpus data, and then click on the "BACK" image (see left) at the top of the page to come back to this page. Or right click on the link and then "Open link in new tab" (in Chrome; similar in other browsers), and then close that tab after viewing the corpus data.


Basic syntax: see also "flex" (variable length) queries

Type Examples Results Explanation / notes
Single word mysterious
skew
mysterious, skew Remember that if you want to see the frequency by section (genre, historical period, dialect, etc), the better option is probably a Chart search
Exact phrase make up
. In particular ,
make up, on the other hand Some words that are a single "word" in written English are considered two "words" by the program that tagged the corpus. A few examples: ca n't, they 'll, Mary 's. In addition, punctuation is considered to be a separate word, such as VERB out . or ADJ !  Finally (unlike with online search engines like Google), there is no need to put quotes before and after an exact phrase: "San Francisco", "good idea".
Any word more * than
* bit
more important than, more money than
a bit, tiny bit
Each asterisk matches one word. So put * * into would show four word strings, with put followed by exactly two words, and then into. If you want a variable number of words (for example, 1 to 3 words), see the powerful, new "flex" (variable length) queries.
Wildcard

*icity
*break*
b?t?er
??????*ism

electricity, multiplicity
break, outbreak, unbreakable
better, butter, butler
criticism, multiculturalism
? matches exactly one letter, and * matches any number of letters (including no letters). You can combine these two symbols to find words between N1 and N2 characters long. For example, ??????*ism would find words with at least nine letters (the six ? plus ism), but it wouldn't find tourism or realism (there are not at least six letters before ism)
Alternant fast|slow
fast|slow rate|progress
fast, slow
slow rate, fast rate, slow progress
You can use either "|" or "/" -- they will yield the same results
NOT pretty -NOUN pretty good, pretty tall Compare pretty NOUN
Lemma (forms) DECIDE
CURVE_n
decide, decides, decided, deciding
curve, curves
(See below)
Part of speech rough NOUN
VERB money
rough time, rough terrain, rough idea
raise money, saving money
(See below)
Synonyms =beautiful
=strong ARGUMENT
beautiful, lovely, attractive
compelling argument, effective arguments
(See below)
Customized word lists I was @emotion
@colors @CLOTHES
is happy, was sad, been worried
white shirt, blue pants, pink blouse
(See below)


Lemmas (forms of words)

If you capitalize an entire word, it will find all forms of that word. For example, DECIDE would find all forms of decide (decide, decides, decided, deciding), whereas decide would just find the single form decide. Note than CURVE_n wouldn't yield curved or curving, since we have limited it just to nouns by appending _n.

Also, note that these are traditional lemmas, which are similar to headwords in a dictionary. In the same way that happy, unhappy, and happiness are all different entries in a dictionary, HAPPY would not yield unhappy or happiness -- those are different lemmas. In other words, this is different than the "word families" approach developed by Paul Nation, where those three words are part of one "word family".

Synonyms

You can search by all of the synonyms of a given word, which provides powerful "semantically-based" searches of the corpus. For example, you can find the synonyms of beautifulnonsense, or clean (v).

Of course you can use the synonyms as part of phrases as well. For example, =CLEAN the NOUN=clever =man, or =strong ARGUMENT. As the last example shows, synonyms can be very useful when you are a non-native speaker, and you want to know which related words are used in a particular context.

As =clean * NOUN shows, not every token will actually be a synonym of a given word in every case. For example, scour may be a synonym of clean in scour the sink, but not in scour the library for good books.

Note the it is often useful to limit the synonyms to those with a particular part of speech, as in =clean_v. It is often also useful to find all forms of the synonyms, by capitalizing the word: =CLEAN. And of course you can combine these as well, for example all forms of all synonyms of clean as a verb (=CLEAN_v).

Finally, note that you can click on the [S] in the results to find synonyms for each word in the results set. This allows you to follow a "synonym chain" from one word to another to another. And of course you can see the frequency of each word, and see it in context as well.    

Customized word lists (detailed help: PDF, video)

"User lists" or "customized lists" are word lists that you create -- related to a certain topic  (e.g. sports, clothing, or emotions), words that are grammatically related (e.g. a certain subset of adverbs or pronouns), or any other list that you might want.  After you've created a list, you can then re-use it in queries at any time in the future -- they remain stored in the database on the server. You can also view the lists that you have created, and modify the wordlist (add or delete words), or delete a list entirely.

To include the words in a customized wordlist in your search, the syntax is:

     @listName   (for example, @foods or @emotion)
 

You can also use the list as part of a phrase:

     I was @emotion,    @colors @CLOTHES  (note the capitalized CLOTHES, to find plural forms of these words as well)


Part of speech

You can use parts of speech as part of your query. For example, ADJ eyes would find a two word string, composed of an adjective followed by the word eyes. Some other examples are: rough NOUNBill NAMEVERB * moneyMOVE ADVNUM waysLET PRON VERB.

You can type the part of speech tags directly into the search form as part of a search. (Click here for a full list of these part of speech tags.) You can also insert part of speech tags is by selecting them from the drop-down list:

There are different ways of specifying the part of speech -- all of which work equally as well. For example, all of the following would find the same strings: ADJ eyesJ eyes_j eyes, [j*] eyes

1. Word 2. Abbreviation 3. CQP-like 4. Older syntax Explanation Example
NOUN N _nn [nn*] Common noun sun, love
NAME NP _np [np*] Proper noun John, Chicago
NOUN+ N+ _n [n*] Common and proper noun sun, Sonny
VERB V _vv [vv*] Lexical verb (no do, be, have) decide, jumped
VERB+ V+ _v [v*] All verbs (including do, be, have) decide, has, is
ADJ J _j [j*] Adjective nice, clean
ADV R _r [r*] Adverb soon, quickly
PRON   _p [p*] Pronoun she, everyone
PREP   _i [i*] Preposition from, on
ART   _a [a*] Article the, his
DET   _d [d*] Determiner these, all
CONJ   _c [c*] Conjunction that, and, or
NEG   _x [x*] Negation not, n't
NUM   _m [m*] Number five, 5
POSS   _ap [ap*] Possessive my, her, their

All other parts of speech: use Type 3 or Type 4, e.g. [nn2*], _nn2, [cst*], _cst

If you are using Type 3 or Type 4 above,  you can use wildcards for the part of speech tag. For example, [nn2*] = plural nouns, [n*] = all nouns, [*n*] = nouns (including ambiguous noun/adj tags), etc. If you are using Type 1 or Type 2, it needs to be upper case: short NOUN (or short N).

You can also add a part of speech tag to the end of any word, but you need to use either Type 3 or Type 4 above. For example, end would find end with any part of speech, but end.[n*] or end_n would limit it to end as a noun, and end.[v*] or end_v would limit it to end as a verb. (Click on the word in the results to see the KWIC (concordance) lines, to verify that they are (mostly) nouns or verbs; there will always be some entries that are mistagged.) Make sure that you separate the word and the part of speech with an underscore (Type 3) (end_v), or a period / full stop and bracket (Type 4) (end.[v*]), and remember that in either case, there is no space between the word and the part of speech tag. Also remember also that you can combine these with lemma searches to find all forms of a word with a given part of speech, e.g. END_v or END.[v*].

  Hint: if you don't know what the part of speech tag is for a given word (or the words in a phrase), just select [OPTIONS] and then [GROUP BY] = [NONE] (SHOW POS). For example, see the PoS tags for lightbackeveryone, or in light of . (More information)  

Combining words

Many of the examples shown above are for individual words. But of course you can combine the different types of searches to create fairly complex phrases. For example:
 
Example Explanation
fast|rapid|quick NOUN Any of these three words followed by a noun
GET him|her to VERB Any form of get + him or her + to + verb
CONJ VERB it out .|, Notice that punctuation can be used like any "word"; just make sure that it is separated from words by a space.
BEAT_v * NOUN+ Any form of beat (as a verb) followed by any word, and then a noun
*ly_r =gorgeous NOUN Adverb (_r( ending in ly, followed by a synonym of gorgeous, followed by a noun.
PUT on POSS @CLOTHES_n Any form of PUT + on + a possessive + any form of any word in the "clothes" list, used as a noun.