tag:blogger.com,1999:blog-20155610.post2252924004295519173..comments2024-03-06T02:46:19.929+00:00Comments on Translation Tribulations: Corpus terminology workshop in the NetherlandsKevin Lossnerhttp://www.blogger.com/profile/14727800526216764023noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-20155610.post-26602863720880366412013-05-17T18:42:34.445+01:002013-05-17T18:42:34.445+01:00The dictionaries are at the top of the concordance...The dictionaries are at the top of the concordance hitlists. <br /><br />PDF isn't much cause for excitement I think, given the usual range of difficulties with that format. You've been able to read text-extractable PDFs into memoQ corpora for a long time now, though the word order for a complex layout gets garbled. With many of the texts that would interest me I would have to do a decent OCR to get the words in a proper, usable sequence. <br /><br />Limited time is a factor that is taken into account very well in the approach promoted by this workshop. I've used it to create corpora for corporate sustainability reports and numerous specialized domains of technical marketing such as fire safety or security, and building and indexing a usable corpus for the area of specific interest for a project takes well under an hour typically. And of course I add to it as new, relevant material is found. The only great divergence for me from the NIFTY approach is that I use memoQ LiveDocs as my repository, and with the improvements in concordance search in the next version (now in beta) the few disadvantages of doing so have been reduced. (I'm thinking particularly of researching collocations, which will now be easier in memoQ if not as nice as in many dedicated concordancing tools.)Kevin Lossnerhttps://www.blogger.com/profile/14727800526216764023noreply@blogger.comtag:blogger.com,1999:blog-20155610.post-707302355420249422013-05-17T15:22:28.833+01:002013-05-17T15:22:28.833+01:00Hi Kevin,
Yes, that is of course always a danger ...Hi Kevin,<br /><br />Yes, that is of course always a danger of letting someone else build your corpus. I am currently trying to build a few of my own with tlCorpus (which btw now accepts PDFs!), but my time is limited and these ready-made online ones sure are a lot easier to set up;)<br /><br />Incidentally, I couldn't find those links to bilingual dictionaries you mentioned. Where exactly did you see them?<br /><br />Michael<br />Michael Beijerhttps://www.blogger.com/profile/12826804655385764008noreply@blogger.comtag:blogger.com,1999:blog-20155610.post-3568482521720422652013-05-16T21:07:01.998+01:002013-05-16T21:07:01.998+01:00Hi Kevin,
Yes, having attended Juliette's work...Hi Kevin,<br />Yes, having attended Juliette's workshop at the Legal Translators' Conference in Portugal, I can confirm that this method promotes "developing *specific* domain terminologies in an efficient manner".<br />ChristinaChristinahttp://www.stridonium.comnoreply@blogger.comtag:blogger.com,1999:blog-20155610.post-26286624512764908082013-05-16T20:52:44.207+01:002013-05-16T20:52:44.207+01:00I poked around a bit in the legal and medical corp...I poked around a bit in the legal and medical corpora - not bad for examples of general vocabulary - and then I discovered the link to bilingual dictionaries at the top of the concordance hitlist. Dangerous stuff in the hands of the ignorant. The English>German dictionary search for a medical term I was looking at pulled up hits that mostly had to do with traffic :-) It seems ill-advised to link dictionaries with no consideration of context.Kevin Lossnerhttps://www.blogger.com/profile/14727800526216764023noreply@blogger.comtag:blogger.com,1999:blog-20155610.post-41535134083912190572013-05-16T19:37:11.547+01:002013-05-16T19:37:11.547+01:00Michael, what I particularly like about the method...Michael, what I particularly like about the method taught in this workshop is its focus on careful text selection with manageable scope in a specific specialist area. These large "bucket" corpora are more general in scope and likely less suited to making the sort of distinctions we would need. Kevin Lossnerhttps://www.blogger.com/profile/14727800526216764023noreply@blogger.comtag:blogger.com,1999:blog-20155610.post-27144726792285020122013-05-16T18:33:40.038+01:002013-05-16T18:33:40.038+01:00Hi Kevin,
Maybe a little off topic, but I just ca...Hi Kevin,<br /><br />Maybe a little off topic, but I just came across an interesting corpus search website which searches 29 different corpora!<br /><br />http://www.lextutor.ca/concordancers/concord_e.html<br /><br />It searches the following corpora:<br /><br />1k Graded Corpus (530,000)<br />2000 List Corous (240,000)<br />2k Graded Corpus (920,000)<br />AA Academic Abstracts<br />Academic Abstracts (174,000)<br />BNC Commerce (3.8 million)<br />BNC Humanities (3.3 million)<br />***BNC Law (2.2 million)*** <br />BNC Med (1.4 million)<br />BNC speech (10 million)<br />BNC Spoken (1 million)<br />BNC Written (1 million) <br />Brown (1 million wds)<br />Brown + BNC Written (2+ m)<br />Call of the Wild (24,000)<br />Focus on Vocab (82,300)<br />JPU Learner (300,000)<br />NNS-Ts in Korea (123,000)<br />NS-Ts in Korea (124,000)<br />Presidential speeches (1.98 million)<br />RAC Academic (103,000)<br />RAC Research Articles Corpus (HK, 132,000 wds)<br />TC Learner (Student) (150,000) <br />TC Learner (Teacher) (61,000)<br />TESL Prog (3,400)<br />Univ. Word List (550,000)<br />US TV Talk (2 million)<br />V - Marlise<br />Yenny Korean EFL teachers corpus<br /><br />Michael Michael Beijerhttp://wordbook.nl/noreply@blogger.com