"FLEX" (VARIABLE LENGTH) QUERIES
 |
Note: click on any link
on this page to see the corpus data, and then
click on the "BACK" image (see left) at the top of the page to come back to
this page. |
You can now do searches where there are a variable number of "slots". For
example, the search:
PUT (NOUN){3} away
(click to run the query)
would find strings with PUT at the beginning and away at the end,
with up to three words between, at least one of which has to be a NOUN. In other
words, it would do the following seven searches, one right after another, and
would then display the results for all of the searches on one page.
|
Searches (done one right after another) |
Matching strings |
1 |
PUT away |
put away (no words in between) |
2 |
PUT NOUN away |
put toys away |
3 |
PUT * NOUN away |
put the toys away |
4 |
PUT NOUN * away |
put toys far away |
5 |
PUT * * NOUN away |
put the fun toys away |
6 |
PUT * NOUN * away |
put the toys far away |
7 |
PUT NOUN
* * away |
put toys and crayons away |
In terms of search syntax, note that:
1. {n} indicates the number of words (0 to n) that
can be in this "variable length" string. Valid numbers are 1, 2, or 3 (in other
words, the longest variable length string is three words)
2. If you don't indicate {n} -- for example (NOUN) -- then it would be just one word --
meaning that it will be either that one word
or nothing
3. Any "slot" without parentheses around it is
obligatory. For example, put * away would not match put away,
since * doesn't have parentheses around it.
4. You can't include multiple "flex" operators in a
search. For example, they (VERB+}{2} notice (NOUN){3} would not be
possible.
The following are some additional searches (from
the one billion word COCA corpus),
along with a few sample matching strings and a few strings that would not by generated by the search (and why not).
Sample search (click to run) |
What WOULD be matched |
What would NOT be matched |
might
(*) know |
might know
might never know |
might never really know
(without {}, matches at most one word)
|
was
(really) interesting |
was interesting (really is
optional)
was really interesting |
was very interesting
(not really)
was not really interesting
(too many words) |
BE
(NEG) worried |
is worried (NEG is optional)
are n't worried |
is really worried
(not NEG)
is n't so worried (two words,
search is max of 1) |
made
(*){3} money |
made more money ( {3}
means 0-3 words)
made a lot more money
(max of 3 words) |
made quite a bit of money
(4 words; max of 3) |
take
* (NOUN){2} away |
take it away (it
from *, which is not optional; no other words from {2}, since 0-2 words)
take the money away (the
from *, money (one slot) from {2})
take even more money away (the
from *, more money (two slots) from {2}) |
take away (* forces at least one
word)
take it quickly away (no
NOUN)
take even more easy money away
(more easy money = 3 words) |
I
(VERB+){3} NOTICE_v |
I was noticing
I had never even noticed (VERB+
matches any verb, including do, be, have; VERB is only lexical
verbs) |
I sometimes notice (no
VERB+)
I had never even ever noticed (4
words; max of 3) |
Some additional notes:
1. Because a "flex search" had involve up to seven
different searches (see above), there are some limits on the number of flex
searches in a given 24 hour period. For those who do not have a
premium or
academic
license, there is a limit of five flex searches in 24 hours. Those who do have a
license can do up to 50 flex searches in a 24 hour period.
2. Again, because of the number of searches that
are done in a flex search, it would take a long time to do these searches if all
of the "slots" are high frequency. This can be a real limitation in very large
corpora like NOW (19+ billion
words) or iWeb (14 billion
words). So a search like HAVE (ADJ){3} time probably won't work in those
corpora -- HAVE and time are too high of frequency. In a case like
this, you will probably need to do these as a series of separate searches --
HAVE time, HAVE * time, HAVE * ADJ time, etc. But again, this should not be a
problem with a small corpus like the
BNC.
|