Texts

The Excel spreadsheet shown above contains complete information on the texts used in the corpus. The [textID] column refers to the [t] value in the URL in the [title] column of the Keyword in Context display. For example, if the URL is http://www.english-corpora.org/bnc/x4.asp?t=APK&ID=92380534, then the [textID] is [APK], and this corresponds to [Quest for a babe] in the speadsheet. For more details on the composition of the BNC, see http://www.natcorp.ox.ac.uk/corpus/index.xml.

The following is the number of words in the different sections of the BNC. These correspond to the column headers in [CHART] view searches and the [GENRE] column of the spreadsheet.

Overview   Details
Genre # words # texts
Spoken 10,334,947 909
Fiction 16,194,885 464
Magazine (W_pop_lore) 7,376,391 211
Newspaper 10,638,034 518
Non-academic 16,634,076 534
Academic 15,429,582 501
Miscellaneous 20,835,159 917
TOTAL 97,626,093 4054
 
Spoken 9,963,663
S_brdcst_disc 736,229
S_brdcst_doc 40,554
S_brdcst_news 254,370
S_classroom 412,372
S_consult 131,354
S_conv 4,012,457
S_courtroom 125,438
S_demonstratn 30,500
S_interv_oral 798,978
S_interview 119,117
S_lect_arts 49,759
S_lect_com 14,757
S_lect_law 49,774
S_lect_natsci 22,168
S_lect_socsci 154,718
S_meeting 1,334,382
S_parliament 95,025
S_pub_debate 278,458
S_sermon 80,135
S_spch+script 196,615
S_spch-script 448,810
S_sportslive 32,103
S_tutorial 138,888
S_unclass 406,702
Fiction 16,194,885
W_fict_drama 44,975
W_fict_poetry 219,409
W_fict_prose 15,644,928
Magazine 7,376,391
W_pop_lore 7,261,990
Newspaper 10,638,034
W_new_arts1 345,860
W_news_arts2 235,525
W_news_com 416,345
W_news_edit 100,659
W_news_misc 1,019,839
W_news_o_com 407,277
W_news_o_rep 2,681,576
W_news_o_sci 54,327
W_news_o_soc 1,125,324
W_news_o_sprt 1,009,878
W_news_rprt 655,508
W_news_sci 64,634
W_news_script 1,262,351
W_news_soc 80,963
W_news_sprt 292,832
W_news_tabld 713,524
Non-academic 16,634,076
W_non_ac_arts 3,722,655
W_non_ac_engin 1,186,625
W_non_ac_law 4,450,696
W_non_ac_med 495,734
W_non_ac_nat 2,491,219
W_non_ac_soc 4,148,256
Academic 15,429,582
W_ac_engin 678,621
W_ac_hum_arts 3,296,072
W_ac_law_edu 4,615,173
W_ac_medicine 1,412,808
W_ac_nat_sci 1,104,527
W_ac_soc_sci 4,224,467
Miscellaneous 20,835,159
W_admin 218,595
W_advert 549,856
W_biography 3,494,374
W_commerce 3,729,662
W_email 209,815
W_essay_schl 145,041
W_essay_univ 55,477
W_hansard 1,149,732
W_inst_doc 542,553
W_instruction 433,932
W_let_pers 51,840
W_let_prof 65,511
W_misc 9,074,079
W_religion 1,114,692
TOTAL 96,263,399

 

time corpus american english wordlists word lists frequency BYU Mark Davies