| 
                    
		
		Download complete list of all 25,094 texts, with metadata 
		The Movies Corpus is composed of 200 million words in
		25,094 texts from the 1930s to the 2010s (the last texts are from 2018).
		The following table shows the number of words by country and
		decade. (Note that MISC means that the first country listed in IMDB was
		not one of the size shown below, although in most cases one of these
		countries is listed as an "additional country".) 
  
			
			|   | 
				
		      US / CA | 
				
		      UK / IE | 
				
		      AU / NZ | 
				
		      Misc | 
				
		      TOTAL | 
			 
			
				| 
		      1930s | 
				
		      6,013,722 | 
				
		      445,980 | 
				
		      2,245 | 
				
		      104,255 | 
				
		      6,566,202 | 
			 
			
				| 
		      1940s | 
				
		      8,679,722 | 
				
		      1,077,429 | 
				
		      --- | 
				
		      51,151 | 
				
		      9,808,302 | 
			 
			
				| 
		      1950s | 
				
		      8,570,819 | 
				
		      1,826,174 | 
				
		      21,777 | 
				
		      197,173 | 
				
		      10,615,943 | 
			 
			
				| 
		      1960s | 
				
		      5,851,067 | 
				
		      2,687,175 | 
				
		      6,594 | 
				
		      557,976 | 
				
		      9,102,812 | 
			 
			
				| 
		      1970s | 
				
		      6,972,688 | 
				
		      2,060,309 | 
				
		      112,715 | 
				
		      958,968 | 
				
		      10,104,680 | 
			 
			
				| 
		      1980s | 
				
		      10,739,129 | 
				
		      2,153,349 | 
				
		      308,640 | 
				
		      917,461 | 
				
		      14,118,579 | 
			 
			
				| 
		      1990s | 
				
		      19,259,078 | 
				
		      2,983,322 | 
				
		      384,607 | 
				
		      1,986,577 | 
				
		      24,613,584 | 
			 
			
				| 
		      2000s | 
				
		      38,572,824 | 
				
		      6,970,252 | 
				
		      793,610 | 
				
		      4,893,749 | 
				
		      51,230,435 | 
			 
			
				| 
		      2010s | 
				
		      48,649,187 | 
				
		      8,705,479 | 
				
		      1,337,876 | 
				
		      4,626,223 | 
				
		      63,318,765 | 
			 
			
				| 
		      TOTAL | 
				
		      153,308,236 | 
				
		      28,909,469 | 
				
		      2,968,064 | 
				
		      14,293,533 | 
				
		      199,479,302 | 
			 
		 
		      
		       
		The texts were taken from the
		
		OpenSubtitles collection. In cases where there were multiple
		subtitles files for a given movie (which was the norm), we used the
		"highest ranked" file, in terms of accuracy (from the ratings at
		OpenSubtitles). We then matched up each movie with the corresponding
		page from IMDB, which
		provides rich metadata for each movie (and which can be used to create
		your own Virtual Corpus).  |