LIST display

You can use parts of speech as part of your query. For example, ADJ eyes would find a two word string, composed of an adjective followed by the word eyes. Some other examples are: rough NOUN, NAME Smith, VERB * money, TALK ADV, NUM people, LET PRON VERB.

An easy way to use part of speech tags is by selecting them from the drop-down list (click on [PoS] to show it). You can also type the part of speech tags directly into the search form.

Click here for a list of these part of speech tags.

Previously, you had to use the part of speech tag (from the link above) inside of brackets, e.g. [j*]. But that's a bit cumbersome for mobile phones, and there are now different ways of specifying the part of speech -- all of which work equally as well. For example, all of the following would find the same strings: ADJ eyes, [j*] eyes, J eyes, _j eyes.

1	2	3	4
Original	New (word)	New (abbrev)	CQP-like	Explanation	Example
[nn*]	NOUN	N	_nn	Common nouns	sun, love
[np*]	NAME	NP	_np	Proper nouns	John, Chicago
[n*]	NOUN+	N+	_n	Common and proper nouns	sun, Sonny
[vv*]	VERB	V	_vv	Lexical verb (no do, be, have)	decide, jumped
[v*]	VERB+	V+	_v	All verbs (incl do, be, have)	decide, has, is
[j*]	ADJ	J	_j	Adjectives	nice, clean
[r*]	ADV	R	_r	Adverbs	soon, quickly
[p*]	PRON		_p	Pronouns	she, everyone
[i*]	PREP		_i	Prepositions	from, on
[a*]	ART		_a	Articles	the, his
[d*]	DET		_d	Determiners	these, all
[c*]	CONJ		_c	Conjunctions	that, and, or
[x*]	NEG		_x	Negation	not, n't
[m*]	NUM		_m	Numbers	five, 5
All other parts of speech: use Type 1 or Type 4, e.g. [nn2], _nn2, [cst], _cst

If you are using Type 1 or Type 4 above, you can use wildcards for the part of speech tag. For example, [nn2*] = plural nouns, [n*] = all nouns, [*n*] = nouns (including ambiguous noun/adj tags), etc. If you are using Type 2 or Type 3, it needs to be upper case: short NOUN.

You can also add a part of speech tag to the end of any word, but you need to use either Type 1 or Type 4 above. For example, end would find end with any part of speech, but end.[n*] or end_n would limit it to end as a noun, and end_v or end_v would limit it to end as a verb. Make sure that you separate the word and the part of speech with a period / full stop and bracket (Type 1) or an underscore (Type 4), and that there is no space. Remember also that you can combine these with lemma searches to find all forms of a word with a given part of speech, e.g. END_v or END.[v*].

If you don't know what the part of speech tag is for a given word (or the words in a phrase), just select [OPTIONS] and then [GROUP BY] = [NONE] (SHOW POS). For example, see the PoS tags for light, back, front, or in light of

You can now do searches where there are a variable number of "slots". For example, the search:

PUT (NOUN){3} away (click to run the query)

would find strings with PUT at the beginning and away at the end, with up to three words between, at least one of which has to be a NOUN. In other words, it would do the following seven searches, one right after another, and would then display the results for all of the searches on one page.

	Searches (done one right after another)	Matching strings
1	PUT away	put away (no words in between)
2	PUT NOUN away	put toys away
3	PUT * NOUN away	put the toys away
4	PUT NOUN * away	put toys far away
5	PUT * * NOUN away	put the fun toys away
6	PUT * NOUN * away	put the toys far away
7	PUT NOUN * * away	put toys and crayons away

In terms of search syntax, note that:

1. {n} indicates the number of words (0 to n) that can be in this "variable length" string. Valid numbers are 1, 2, or 3 (in other words, the longest variable length string is three words)

2. If you don't indicate {n} -- for example (NOUN) -- then it would be just one word -- meaning that it will be either that one word or nothing

3. Any "slot" without parentheses around it is obligatory. For example, put * away would not match put away, since * doesn't have parentheses around it.

4. You can't include multiple "flex" operators in a search. For example, they (VERB+}{2} notice (NOUN){3} would not be possible.

The following are some additional searches. They produce interesting results in the one billion word COCA corpus), but the results in other corpora may not be as good. In each case, we show a few sample matching strings, and some strings that would not be generated by the search (and why not).

Sample search (click to run)	What WOULD be matched	What would NOT be matched
might *()** know	might know might never know	might never really know (without {}, matches at most one word)
was (really) interesting	was interesting (really is optional) was really interesting	was very interesting (not really) was not really interesting (too many words)
BE (NEG) worried	is worried (NEG is optional) are n't worried	is really worried (not NEG) is n't so worried (two words, search is max of 1)
made *(){3}** money	made more money ( {3} means 0-3 words) made a lot more money (max of 3 words)	made quite a bit of money (4 words; max of 3)
take * (NOUN){2} away	take it away (it from , which is not optional; no other words from {2}, since 0-2 words) take the money* away (the from , money* (one slot) from {2}) take even more money away (even from , more money* (two slots) from {2})	take away (* forces at least one word) take it quickly away (no NOUN) take even more easy money away (more easy money = 3 words)
I (VERB+){3} NOTICE_v	I was noticing I had never even noticed (VERB+ matches any verb, including do, be, have; VERB is only lexical verbs)	I sometimes notice (no VERB+) I had never even ever noticed (4 words; max of 3)

Some additional notes:

1. Because a "flex search" had involve up to seven different searches (see above), there are some limits on the number of flex searches in a given 24 hour period. For those who do not have a premium or academic license, there is a limit of five flex searches in 24 hours. Those who do have a license can do up to 50 flex searches in a 24 hour period.

2. Again, because of the number of searches that are done in a flex search, it would take a long time to do these searches if all of the "slots" are high frequency. This can be a real limitation in very large corpora like NOW (19+ billion words) or iWeb (14 billion words). So a search like HAVE (ADJ){3} time probably won't work in those corpora -- HAVE and time are too high of frequency. In a case like this, you will probably need to do these as a series of separate searches -- HAVE time, HAVE * time, HAVE * ADJ time, etc. But again, this should be a problem with a small corpus like the BNC.

CHART display

If you are interested in a set of words or a grammatical construction, then the LIST option shows the frequency of each matching form (end up being, ended up saying, etc), while the CHART option shows the total frequency in each section.

See additional information (new in September 2024)

COLLOCATES display

You can use collocates to do "variable length" searches, where there might be 0-4 (or more) words between two other sets of words or phrases. For example, you could find all of the following with one simple search.

(were) talked --- into coming (0 words)
talk them into coming (1 word)
talk the girls into coming (2 words)
talk some other people into coming (3 words)
talk lots of other people into coming (4 words)

In the sample queries below, you would enter the following in WORD(S), COLLOCATES, and the maximum length in words (up to nine words, left and right) between WORD(S) and COLLOCATES. For example, O L | 4 R means the COLLOCATES are between 0 words to the left and 4 words to the right of WORD(S).
Click on A , B , or C below to run the sample queries.

# words construction

A VERB NOUN PHRASE into _vvg

1 L | 0 R VERB her into _vvg e.g. talked her into staying

2 L | 0 R VERB the people into _vvg

4 L | 0 R VERB my best friend into _vvg

B EXPECT [a*]|[d*]|[n*]|[p*] NOUN PHRASE [v?i*]

0 L | 2 R EXPECT them to [v?i*]   ( them = [p*] pronoun )

0 L | 3 R EXPECT Bill Clinton to [v?i*]   ( Bill = [np*] proper noun )

0 L | 4 R EXPECT those six people to [v?i*]   ( those = [d*] demonstrative )

0 L | 5 R EXPECT the people in Florida to [v?i*]   ( the = [a*] article )

C what|all RELATIVE CLAUSE do [be] VERB

4 L | 0 R what|all he wants to do BE VERB e.g. what|all he wants to do is complain

5 L | 0 R what|all they expected Fred to do BE VERB

7 L | 0 R what|all any of these crazy people can do BE VERB

8 L | 0 R what|all your best friend can possibly hope to do BE VERB

Note

Use [a*]|[d*]|[n*]|[p*] to look for the first word of a noun phrase (you may want to refine this further). You can also use the negator - to indicate NOT, e.g. -VERB|ADV (not verb or adverb) or -to|will|would (none of these three words). Make sure there is no space to the left or right of | when there is a series of elements.

Notes:
1. Not all of the KWIC entries will in fact be relevant, because we haven't placed any constraints on what is between the yellow and the green parts of the search. But using the yellow portion as an "anchor" is still far better than searching for just the green portion.
2. The green (collocates) portion can only have one word, not a sequence of two or three words. For this one word, however, there can be any number of possibilities, such as either what or all in [B] above.
3. Another option is to do a variable length phrase/sequence search. The advantage of that approach is that you can see (and limit) the intervening words. The disadvantage is that the "variable length" section is limited to three words.

COMPARE WORDS display

Compare the collocates of two words, to see how they differ in meaning and usage. For example, utter and sheer (note the negative collocates with utter), warm and hot, small and little, or adjectives near boy and girl.

By comparing collocates, you can move far beyond the simplistic entries in a thesaurus, to "tease out" slight differences in words, or (as in the case of boy and girl ) what is the difference in what is being said about two different things.

Please review the discussion of collocates to see how to select the span for the collocates.

See additional information (new in September 2024)

KWIC (Keyword in Context) display

See the patterns in which a word occurs, by sorting the words to the left and/or right. For example: budge (v), matter (n), diametrically, end up, or naked eye.

How to do it:

- - - - - - - *

Select the words that you want to sort with. Select L for 1, 2, and 3 words to the left. Select R for 1, 2, and 3 words to the right. You could also, for example, sort by one word to the left, then one and two words to the right. Click * to clear the entries and start over.

See additional information (new in September 2024)

Use the dropdown list to the left (POS or _pos) to input tags for parts of speech (PoS, e.g. nouns or verbs) into your search string.

By default, it will add the PoS as a "full word", as in the searches strong NOUN or ADJ eyes.

You can also have the PoS added as a "tag" on the end of a word, to limit the word to that PoS, as in the searches strike_n or and FIND_v.

To make it insert PoS tags after words, click on _pos. To change it back to PoS as a separate "word", click on POS.

You can find a wealth of information for the top 60,000 words in the corpus. As the following examples with bread show, you can see:

an overview of all of the information below
related topics (words that co-occur anywhere on the web page)
collocates (automatically grouped by part of speech)
clusters (the most frequent 2, 3, and 4 word strings)
a resortable Keyword in Context (concordance) display
related words (synonyms and WordNet entries), and
websites use that word the most (can use these to create Virtual Corpora).

You can find a wealth of information for the top 40,000 words in the corpus, including:

definitions and synonyms (including links to external dictionaries) and links to external images and videos
frequency information, including frequency by genre and country
collocates (nearby words), which provides insight into meaning and usage
topics (co-occurring words anywhere on the webpage), which provide perhaps even better insight into meaning
concordance lines, to see the patterns in which a word occurs

SECTIONS

SHOW Determines whether the frequency is shown for each "section" of the corpus . For example, the synonyms of beautiful in each section and overall.

See additional information (new in September 2024)

OTHER OPTIONS

# HITS is the number of results.

# KWIC is the number of results for a KWIC (concordances) search.

GROUP BY determines whether words are grouped by word form (e.g. decide and decided separately), lemma (e.g. all forms of decide together), and whether you see the part of speech for word (e.g. beat as a noun and verb displayed separately).

SHOW # TEXTS determines whether you see the number of texts in which a word or phrase occurs, in addition to its frequency. This can be useful in finding words and phrases that are limited just to a few texts in the corpus. (More information)

CASE SENSITIVE determines whether She thought and she thought would be two different searches, or The Office, the Office, and the office.

DISPLAY shows raw frequency, occurrences per million words, or a combination of these.

SAVE LISTS allows you to create a wordlist from the results and then re-use it later in your searches.

See additional information (new in September 2024)

SORT / LIMIT

Sort by raw frequency (e.g. hard * ) or by "relevance" ( hard *). Relevance uses the Mutual Information score.

It is often useful to specify the minumim frequency when you are sorting by "relevance", to eliminate very low frequency strings. For example, collocates of green where minimum frequency = 1 (strange once-off strings) and where minimum frequency = 20.

Note also that when you do a collocates search and you don't specify anything for the collocates field, it will automatically set MINIMUM to MUT INFO = 3 (Mutual Information score). It does this to remove high frequency noise words like the, to, with, etc. If you want to see more of these words, lower the MI score; to see less, increase it.

See additional information (new in September 2024)

VIRTUAL CORPORA

Create a "virtual corpus" -- essentially your own personalized corpus within . You can create the corpus either by keywords in the texts (e.g. texts with the words investments, basketball, or biology), or information about the texts (e.g. date, title, or source), or a combination of keyword and text information.

You can then edit your virtual corpora, search within a particular virtual corpus, compare the frequency of a word, phrase or grammatical construction in your different virtual corpora, and also create "keyword lists" based on the texts in your virtual corpus.

Click on any of the links above for more information.

Quick overview Detailed overview