English-Corpora.org

English-Corpora.org


"FLEX" (VARIABLE LENGTH) QUERIES

  Note: click on any link on this page to see the corpus data, and then click on the "BACK" image (see left) at the top of the page to come back to this page.

You can now do searches where there are a variable number of "slots". For example, the search:

PUT (NOUN){3} away  (click to run the query)

would find strings with PUT at the beginning and away at the end, with up to three words between, at least one of which has to be a NOUN. In other words, it would do the following seven searches, one right after another, and would then display the results for all of the searches on one page.

  Searches (done one right after another) Matching strings
1 PUT     away put away  (no words in between)
2 PUT  NOUN  away put toys away
3 PUT  * NOUN  away put the toys away
4 PUT  NOUN *  away put toys far away
5 PUT  * * NOUN  away put the fun toys away
6 PUT  * NOUN *  away put the toys far away
7 PUT  NOUN * *  away put toys and crayons away

In terms of search syntax, note that:

1. {n} indicates the number of words (0 to n) that can be in this "variable length" string. Valid numbers are 1, 2, or 3 (in other words, the longest variable length string is three words)

2. If you don't indicate {n} -- for example (NOUN) -- then it would be just one word -- meaning that it will be either that one word or nothing

3. Any "slot" without parentheses around it is obligatory. For example, put * away would not match put away, since * doesn't have parentheses around it.

4. You can't include multiple "flex" operators in a search. For example, they (VERB+}{2} notice (NOUN){3} would not be possible.

The following are some additional searches (from the one billion word COCA corpus), along with a few sample matching strings and a few strings that would not by generated by the search (and why not).

Sample search (click to run) What WOULD be matched What would NOT be matched
might (*) know might know
might never know
might never really know (without {}, matches at most one word)
 
was (really) interesting was interesting (really is optional)
was really interesting
was very interesting (not really)
was not really interesting (too many words)
BE (NEG) worried is worried (NEG is optional)
are n't worried
is really worried (not NEG)
is n't so worried (two words, search is max of 1)
made (*){3} money made more money ( {3} means 0-3 words)
made
a lot more money (max of 3 words)
made quite a bit of money (4 words; max of 3)
take * (NOUN){2} away take it away (it from *, which is not optional; no other words from {2}, since 0-2 words)
take the money away (the from *, money (one slot) from {2})
take even more money away (the from *, more money (two slots) from {2})
take away (* forces at least one word)
take it quickly away (no NOUN)
take even more easy money away (more easy money = 3 words)
I (VERB+){3} NOTICE_v I was noticing
I had never even noticed (VERB+ matches any verb, including do, be, have; VERB is only lexical verbs)
I sometimes notice (no VERB+)
I had never even ever noticed (4 words; max of 3)

 Some additional notes:

1. Because a "flex search" had involve up to seven different searches (see above), there are some limits on the number of flex searches in a given 24 hour period. For those who do not have a premium or academic license, there is a limit of five flex searches in 24 hours. Those who do have a license can do up to 50 flex searches in a 24 hour period.

2. Again, because of the number of searches that are done in a flex search, it would take a long time to do these searches if all of the "slots" are high frequency. This can be a real limitation in very large corpora like NOW (19+ billion words) or iWeb (14 billion words). So a search like HAVE (ADJ){3} time probably won't work in those corpora -- HAVE and time are too high of frequency. In a case like this, you will probably need to do these as a series of separate searches -- HAVE time, HAVE * time, HAVE * ADJ time, etc. But again, this should not be a problem with a small corpus like the BNC.