Aug 1, 2016

Corpus Linguistics and AntConc in the 2016 US Presidential Contest

Professor Laurence Anthony's AntConc concordancing software remains my favorite tool for analyzing the word content of text collections for my professional translation purposes. Although a free tool, it offers some important functionality beyond what I can get from the integrated term extraction and concordancing means in my translation environment tools, particularly SDLMultiTerm Extract and memoQ. AntConc is my first recommendation to my friends who teach at university and want to introduce their students to practical corpus linguistics and to my clients in industry who need to produce useful glossaries which cover the most frequently discussed things in their range of products and services.

That is not to say that its features are the most wide-ranging, but in addition to dead-simple incorporation of stopword lists (still a problem for most memoQ users), AntConc (like many other academic concordancers) offers excellent facilities for studying collocations, those words which occur together in important contexts. For years I have begged that this useful feature be added to the tools for professional translators, because it is a great aid in studying the proper language of a particular field or subject matter, and although the memoQ concordance can in fact search for multiple terms at once so that one forms a visual impression of their co-occurrence in text, it lacks the simple precision of AntConc for specifying the proximity range of the words found together in a sentence.

In one form or another, tools for analyzing the frequency of words and the contexts in which they occur have been a part of my life for a very long time. And yet it did not occur to me to use them as a means of studying the many words that are part of the many political and social debates taking place in the countries that concern me. One can get a quick impression with fun word cloud pictures (such as those in this post, created from the convention speeches of The Orange One and The Infamous HRC using a free online tool). But AntConc lets you go deeper and achieve a greater understanding of how language is used to influence our thoughts and discussions.

Katelyn Guichelaar and Kristin Du Mez have done just that in an interesting article title, "Donald Trump and Hillary Clinton, By Their Words", which offers some interesting insights into the psychology and public postures of the two candidates. No spoilers here – go read the article and enjoy. Then think about the professionally and personally relevant ways in which you might use the practical tools of corpus linguistics.


  1. It would be interesting to see the collocations, from one to five words either way, of all those personal pronouns used by both candidates - that is. if Donald Trump actually does ever string 11 words together coherently.
    Katelyn Guichelaar and Kristin Du Mez do not mention that relatively speaking 25,722 and 23,089 words are not particularly large corpora as corporate go, which means that care has to be taken not to make biased extrapolations from the data.

    Thank you for the recent cheeky inclusion of my blog on your blog roll, Kevin. :)

  2. A MOOC on "Corpus Linguistics: Method, Analysis, Interpretation" with Lancaster University et al. has just started here:
    May be of interest.


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)