† corpus.byu.edu Corpora Overview PDF Videos Resources Help / FAQ My account

English-Corpora.org

Word / phrase searches (search form, corpora used, corrections, +/- sections, compare to Chart, Collocates searches)

Note: click on any link on this page to see the corpus data, and then click on the "BACK" image (see left) at the top of the page to come back to this page. Or right click on the link and then "Open link in new tab" (in Chrome; similar in other browsers), and then close that tab after viewing the corpus data.

Basic syntax: see also "flex" (variable length) queries

Type	Examples	Results	Explanation / notes
Single word	mysterious skew	mysterious, skew	Remember that if you want to see the frequency by section (genre, historical period, dialect, etc), the better option is probably a Chart search
Exact phrase	make up . In particular ,	make up, on the other hand	Some words that are a single "word" in written English are considered two "words" by the program that tagged the corpus. A few examples: ca n't, they 'll, Mary 's. In addition, punctuation is considered to be a separate word, such as VERB out . or ADJ ! Finally (unlike with online search engines like Google), there is no need to put quotes before and after an exact phrase: ~~"San Francisco"~~, ~~"good idea"~~.
Any word	more * than * bit	more important than, more money than a bit, tiny bit	Each asterisk matches one word. So put * into* would show four word strings, with put followed by exactly two words, and then into. If you want a variable number of words (for example, 1 to 3 words), see the powerful, new "flex" (variable length) queries.
Wildcard	icity break* b?t?er ??????*ism	electricity, multiplicity break, outbreak, unbreakable better, butter, butler criticism, multiculturalism	? matches exactly one letter, and * matches any number of letters (including no letters). You can combine these two symbols to find words between N1 and N2 characters long. For example, ??????ism would find words with at least nine letters (the six ? plus ism), but it wouldn't find tourism* or realism (there are not at least six letters before ism)
Alternant	fast\|slow fast\|slow rate\|progress	fast, slow slow rate, fast rate, slow progress	You can use either "\|" or "/" -- they will yield the same results
NOT	pretty -NOUN	pretty good, pretty tall	Compare pretty NOUN
Lemma (forms)	DECIDE CURVE_n	decide, decides, decided, deciding curve, curves	(See below)
Part of speech	rough NOUN VERB money	rough time, rough terrain, rough idea raise money, saving money	(See below)
Synonyms	=beautiful =strong ARGUMENT	beautiful, lovely, attractive compelling argument, effective arguments	(See below)
Customized word lists	I was @emotion @colors @CLOTHES	is happy, was sad, been worried white shirt, blue pants, pink blouse	(See below)

Lemmas (forms of words)

If you capitalize an entire word, it will find all forms of that word. For example, DECIDE would find all forms of decide (decide, decides, decided, deciding), whereas decide would just find the single form decide. Note than CURVE_n wouldn't yield curved or curving, since we have limited it just to nouns by appending _n.

Also, note that these are traditional lemmas, which are similar to headwords in a dictionary. In the same way that happy, unhappy, and happiness are all different entries in a dictionary, HAPPY would not yield unhappy or happiness -- those are different lemmas. In other words, this is different than the "word families" approach developed by Paul Nation, where those three words are part of one "word family".

Synonyms

You can search by all of the synonyms of a given word, which provides powerful "semantically-based" searches of the corpus. For example, you can find the synonyms of beautiful, nonsense, or clean (v).

Of course you can use the synonyms as part of phrases as well. For example, =CLEAN the NOUN, =clever =man, or =strong ARGUMENT. As the last example shows, synonyms can be very useful when you are a non-native speaker, and you want to know which related words are used in a particular context.

As =clean * NOUN shows, not every token will actually be a synonym of a given word in every case. For example, scour may be a synonym of clean in scour the sink, but not in scour the library for good books.

Note the it is often useful to limit the synonyms to those with a particular part of speech, as in =clean_v. It is often also useful to find all forms of the synonyms, by capitalizing the word: =CLEAN. And of course you can combine these as well, for example all forms of all synonyms of clean as a verb (=CLEAN_v).

Finally, note that you can click on the [S] in the results to find synonyms for each word in the results set. This allows you to follow a "synonym chain" from one word to another to another. And of course you can see the frequency of each word, and see it in context as well.

Customized word lists (detailed help: PDF, video)

"User lists" or "customized lists" are word lists that you create -- related to a certain topic (e.g. sports, clothing, or emotions), words that are grammatically related (e.g. a certain subset of adverbs or pronouns), or any other list that you might want. After you've created a list, you can then re-use it in queries at any time in the future -- they remain stored in the database on the server. You can also view the lists that you have created, and modify the wordlist (add or delete words), or delete a list entirely. To include the words in a customized wordlist in your search, the syntax is: @listName (for example, @foods or @emotion)
You can also use the list as part of a phrase: I was @emotion, @colors @CLOTHES (note the capitalized CLOTHES, to find plural forms of these words as well)

Part of speech

You can use parts of speech as part of your query. For example, ADJ eyes would find a two word string, composed of an adjective followed by the word eyes. Some other examples are: rough NOUN, Bill NAME, VERB * money, MOVE ADV, NUM ways, LET PRON VERB.

You can type the part of speech tags directly into the search form as part of a search. (Click here for a full list of these part of speech tags.) You can also insert part of speech tags is by selecting them from the drop-down list:

There are different ways of specifying the part of speech -- all of which work equally as well. For example, all of the following would find the same strings: ADJ eyes, J eyes, _j eyes, [j*] eyes

1. Word	2. Abbreviation	3. CQP-like	4. Older syntax	Explanation	Example
NOUN	N	_nn	[nn*]	Common noun	sun, love
NAME	NP	_np	[np*]	Proper noun	John, Chicago
NOUN+	N+	_n	[n*]	Common and proper noun	sun, Sonny
VERB	V	_vv	[vv*]	Lexical verb (no do, be, have)	decide, jumped
VERB+	V+	_v	[v*]	All verbs (including do, be, have)	decide, has, is
ADJ	J	_j	[j*]	Adjective	nice, clean
ADV	R	_r	[r*]	Adverb	soon, quickly
PRON		_p	[p*]	Pronoun	she, everyone
PREP		_i	[i*]	Preposition	from, on
ART		_a	[a*]	Article	the, his
DET		_d	[d*]	Determiner	these, all
CONJ		_c	[c*]	Conjunction	that, and, or
NEG		_x	[x*]	Negation	not, n't
NUM		_m	[m*]	Number	five, 5
POSS		_ap	[ap*]	Possessive	my, her, their
All other parts of speech: use Type 3 or Type 4, e.g. [nn2], _nn2, [cst], _cst

If you are using Type 3 or Type 4 above, you can use wildcards for the part of speech tag. For example, [nn2*] = plural nouns, [n*] = all nouns, [*n*] = nouns (including ambiguous noun/adj tags), etc. If you are using Type 1 or Type 2, it needs to be upper case: short NOUN (or short N).

You can also add a part of speech tag to the end of any word, but you need to use either Type 3 or Type 4 above. For example, end would find end with any part of speech, but end.[n*] or end_n would limit it to end as a noun, and end.[v*] or end_v would limit it to end as a verb. (Click on the word in the results to see the KWIC (concordance) lines, to verify that they are (mostly) nouns or verbs; there will always be some entries that are mistagged.) Make sure that you separate the word and the part of speech with an underscore (Type 3) (end_v), or a period / full stop and bracket (Type 4) (end.[v*]), and remember that in either case, there is no space between the word and the part of speech tag. Also remember also that you can combine these with lemma searches to find all forms of a word with a given part of speech, e.g. END_v or END.[v*].

Hint: if you don't know what the part of speech tag is for a given word (or the words in a phrase), just select [OPTIONS] and then [GROUP BY] = [NONE] (SHOW POS). For example, see the PoS tags for light, back, everyone, or in light of . (More information)

Combining words

Many of the examples shown above are for individual words. But of course you can combine the different types of searches to create fairly complex phrases. For example:

Example	Explanation
fast\|rapid\|quick NOUN	Any of these three words followed by a noun
GET him\|her to VERB	Any form of get + him or her + to + verb
CONJ VERB it out .\|,	Notice that punctuation can be used like any "word"; just make sure that it is separated from words by a space.
BEAT_v * NOUN+	Any form of beat (as a verb) followed by any word, and then a noun
*ly_r =gorgeous NOUN	Adverb (_r( ending in ly, followed by a synonym of gorgeous, followed by a noun.
PUT on POSS @CLOTHES_n	Any form of PUT + on + a possessive + any form of any word in the "clothes" list, used as a noun.