Mark Davies / August 2025
(updated video from 23 August 2025)
English-Corpora.org now offers something
entirely new: the ability to combine the depth and reliability of corpus
data with the analytic power of Large Language Models (LLMs) like GPT,
Gemini, Claude, Perplexity, Llama, Mistral, and DeepSeek.
With just one click, the corpus can send
collocates, frequency patterns, phrase lists, or concordance lines to an
LLM — which will instantly group, explain, and interpret the data. These
AI-powered insights appear directly in the interface, alongside the
original corpus results.
The result? Faster understanding of
patterns, clearer semantic groupings, and deeper insight into how
language works — for language learners and researchers alike. The corpus
data remains front and center, but now with the option of an intelligent
assistant working behind the scenes to guide your analysis.
Soon after I finished this comparison,
however, I thought -- maybe it's not a question of "either/or".
Maybe it's a issue of "and/with". Why not take the strengths of AI /
LLMs, and integrate them right into the corpus interface? As the
comparison between corpora and AI/LLMs
indicate, what LLM are really good at is classifying and
explaining data. And that's why this insight is being directly
integrated into English-Corpora.org.
The following are some of the ways that the AI/LLM
insights have been integrated with corpus data. Pay close attention to the
categorization and especially the analysis from the LLMs. None of the analyses
that you see on any of these pages are human-generated; they all come from the
LLMs. I think you'll agree that this insight from LLMs will completely transform
the way that people interact with the corpora, especially for non-native
speakers and language learners.
(Of course, while the AI analyses offer powerful
insights, users should remember that they represent intelligent suggestions
based on patterns in the data, and they are not 100% accurate linguistic
conclusions. If you demand absolute perfection and accuracy, then LLMs might not
be for you.)
Task / function
Examples / discussion
Video
PDF
Introduction: Integrating LLM insights and corpus data
The mechanics of integrating LLM insights into corpus
data
Classifying and categorizing collocates
Collocates of cap
Classifying and categorizing collocates (COCA, iWeb)
Collocates of bow
Classifying and categorizing phrases
soft NOUN
Comparing two words (via collocates)
Quandary vs predicament, provoke vs
incite
Comparing two genres, time periods, and dialects
(lists)