English-Corpora.org

English-Corpora.org

 


Data from Google Analytics (see below for 2025) shows that the corpora from English-Corpora.org are used by almost 500,000 distinct people each year. In other words, if a person uses Corpus X a couple of times each week, as well as Corpus Y two or three times each month, and then three other corpora (A, B, and C) on an occasional basis, then all of this use would count as just "one" person.

We are not aware of any other corpus site that has even half as many users as English-Corpora.org.


The users of English-Corpora.org come from all over the world. For example, the following table shows the 50 countries that had the most users in 2025. So again, if the chart shows 5,000 users from a particular country, those are 5,000 distinct researchers / students who have registered to use the corpora, and who have used the corpora during 2025. You can also see more detailed information on researchers by country.

But there are some complications. Although English-Corpora.org offers the most widely used corpora in the world, the number of users has decreased about 15% during the last three years. For example, compare below the number of users in Sep-Dec in 2023, 2024, and 2025. (We use just those four months of the year, because Google Analytics changed everything in August 2023, and it's almost impossible to compare the data before September 2023 with the data from after that date.)

But notice that although the number of users decreased, the overall usage of the corpora increased from 2023-2025. To see this, multiply the number of sessions by the average time spent in each session. In 2023 there were 6,367,440 total minutes spent using the corpora, and in 2025 it was 6,572,363. Not a huge increase, but an increase nonetheless.

2023
2024
2025


The fact that the numbers of users is declining (even though overall usage is increasingly slightly) is part of a much larger trend, in which the usage for many websites has decreased markedly in the last three or four years. This is probably because AI companies are "scraping" huge amounts of data from these sites, and then making it available to users via their LLMs (ChatGPT, Gemini, etc), which then decreases the demand for the "original" websites from which the data was taken. (For a good overview of this trend and how this is killing off many websites, see articles like the following: Brookings Foundation (good overview: "AI is eating the web that enabled it"), Columbia Journalism Review, Wall Street Journal, Washington Post, Forbes, Reuters, Guardian, Times Higher Ed Quartz, CCI, CyberNews, MediaLeader, SearchEngineLand; and regarding Wikipedia, for example: 1, 2, 3, 4, 5)

I have spoken with several other people who provide online language resources (including some very large and very well-known publishing companies), and in most cases usage of their products has declined precipitously since ChatGPT was released in late 2022. For example, many online dictionaries have either been abandoned completely by their creators / publishers, or else they are on "life support". So it might be wise to be skeptical of any site (that provides language resources, including corpora) that claims that the number of users has increased from 2023-2025 -- unless they can prove this with actual data from a reliable site like Google Analytics.

As far as the specifics of what is happening at English-Corpora.org, my guess is that the "less involved" users have moved on to ChatGPT or Gemini or some other LLM, since the only thing that these users were doing in the first place were simple queries like "tell me what words occur near tomato", or "give me some uses of the word 'fun' in context". But these less educated and less sophisticated users are probably unaware that for many types of searches, the AI data doesn't really compare very well with corpus data. So although these "low involvement" users have the illusion of getting reliable data from the LLMs, in fact this is quite low quality data.

For serious users, however, I suspect that there has been very little if any decrease in the usage of English-Corpora.org since the release of ChatGPT and other LLMs in late 2022 -- they have figured out that LLMs just don't provide the quality of data that they need. And this is supported by the fact that income from premium licenses and academic licenses has increased since 2023. People recognize and are willing to pay for quality data.

So yes, the number of users at English-Corpora.org has decreased since 2023, but most of this is probably among people who were doing very simplistic searches to begin with. And the data shows that educated and informed researchers, teachers, and learners are probably using the corpora more than ever.

And finally, we should realize that corpora and AI aren't necessarily "enemies". The best approach is probably to have AI and corpora working together, which is exactly what is now possible at English-Corpora.org (and only at English-Corpora.org).