Jun 25, 2017

NOW is not the National Organization of Words...

... but with over 4 billion of them, that interpretation of the News on the Web corpus at Brigham Young University would be plausible. BYU is known for its high quality research corpora available to the public. The news corpus grows by about 10,000 articles each day, and its content can be searched online or downloaded.

The results are displayed in a highlighted keyword in context (KWIC) hit list with the source publications indicated in the "CONTEXT" column:

As a legal translator, I find the BYU corpus of US Supreme Court Opinions more useful. It displays results in a similar manner:

It is difficult or impossible to configure a direct search in these corpora using memoQ Web Search, IntelliWebSearch or similar integrated web search features in translation environments. However, these tools can be used as a shortcut to open the URL, and the search string can be applied once the site has been accessed. Since I perform searches like this to study context infrequently, a standalone shortcut with IWS serves me best; if I were using this to study usage in a language I don't master very well, like Portuguese (yes there is a Portuguese corpus at BYU - actually, two of them, one historical), then I might include the URL in a set of sites which open every time I invoke memoQ Web Search or a larger set of terminology-related sites in an IntelliWebSearch group.

One great benefit of using such corpora as a language learner, is that context and collocations (words that occur together with a particular word or phrase) can be studied easily, better than with dictionaries, enabling one to sound a bit less like an idiot in a second, third, fourth or fifth language. Or for many perhaps, even their first language :-)


  1. Thank you for the insight sometimes the dictionaries can be very bulky and hectic

  2. I think you should try WinAutomation a day or another, as it solves searching issues that other short-cuts managers don't


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)