The following is an extended discussion of why we believe that our use of the texts in the corpora is within the bounds of US Fair Use Law.

The following are the four criteria used to determine whether materials fall under the provisions of the Fair Use Law:


What favors
Fair Use status

Corpora at English-Corpora.org

The amount and substantiality of the portion taken

Small portions of the original text, rather than full-text access

Under no circumstances whatsoever do end users have access to entire texts (e.g. newspaper, magazine, or journal articles, or short stories). All access is via the web interface, and the vast majority of what users see are simply frequency charts showing the frequency of words or phrases in different parts of the corpus. Access to small portions of the original text is more of an "afterthought", rather than the central feature of the interface.

Access to actual portions of the original text is limited to very short "Keyword in Context" displays, where users see just a handful of words to the left and the right of the word(s) searched for. In addition, all access is logged, and users can only perform a limited number of searches per day. As a result, it would be difficult for end users to re-create even one paragraph from the original text, and it would be virtually impossible to re-create an entire page of text, much less the entire article.

This "snippet defense" (which relies on limited access to the original text via small snippets from the web interface) is the same one used by Google Books for its use of millions of copyrighted materials. In addition, we have consulted two lawyers who specialize in Internet copyright law (names available upon request). They have both stated that because of our limited access to end users, as well as our status with regards to the other three factors shown here, we are clearly in accord with the provisions of the Fair Use statute.

The purpose and character of the use

Academic, non-commercial

Our use of the texts is strictly for academic research.

The nature of the copyrighted work

Non-creative works

There are some creative works (e.g. short stories and small sections of novels) in the corpus, but more than 80% of the corpus is composed of transcripts of TV shows, and articles from newspapers, magazines, and academic journals.

The effect of the use upon the potential market

Little or no effect on the copyright holder

Because of the very limited access via our web interface (see the first item above), it is extremely unlikely that anyone would use this corpus as a "substitute" for other access to the original texts. Other sources make these texts available as "complete articles", which are meant to be read in their entirety. That is completely impossible with our interface.

Access to the texts via our interface, as compared to access via other sources, serves two completely different audiences. Our interface is designed for linguists and language learners who want to see the frequency of words, phrases, synonyms, etc., and it is completely inadequate for anyone who wishes to read the entire text of an article. As a result, there is very little or no "competition" between our service and that provided by others, and therefore virtually no market impact.