Jul 26, 2017

Shortcuts to managing bitext corpora and terminologies in free Google Sheets

When I presented various options for using spreadsheets available in the free Google Office tools suite on one's Google Drive, I was asked if there wasn't a "simpler" way to do all this.

What's simple? The answer to that depends a lot on the individual. Yes, great simplicity is possible with using the application programming interface for parameterized URL searches described in my earlier articles on this topic:
The answer is yes. However, there will be some restrictions to accept regarding your data formats and what you can do with them. If that is acceptable, keep reading and you'll find some useful "cookie cutter" options.

When I wrote the aforementioned articles, I assumed that readers unable to cope with creating their own queries would simply ask a nerdy friend for five minutes of help. But another option would be to used canned queries which match defined structures of the spreadsheet.

Let's consider the simplest cases. For anything more complicated, post questions in the comments. One can build very complex queries for a very complex glossary spreadsheet, but if that's where your at, this and other guns are for hire, no checks accepted.

You have bilingual data in Language A and Language B. These can be any two languages, even the same "language" with some twist (like a glossary of a modern standard English with 19th century thieves' cant from London). The data can be a glossary of terms, a translation memory or other bitext corpus, or even a monolingual lexicon (of special terms and their definitions or other relevant information. The fundamental requirement is that these data are placed in an online spreadsheet, which can be created online or uploaded from your local computer and that Language A be found in Column A of the spreadsheet and Language B (or the definition in a monolingual lexicon) in Column B of the spreadsheet. And to make things a little more interesting we'll designate Column C as the place for additional information.


Now let's make a list of basic queries:
  1. Search for the text you want in Column A, return matches for A as well as information in Column B and possibly C too in a table in that order
  2. Search for the text you want in Column B, return matches for B as well as information in Column A and possibly C too in a table in that order
  3. Search for the text you want in Column A or Column B, return matches for A/B and possibly C too in a table in that order

Query 1: searching in Column A

The basic query could be: SELECT A, B WHERE A CONTAINS '<some text>'
Of course <some text> is substituted by the actual text to look for enclosed in the single straight quote marks. If you are configuring a web search program like IntelliWebSearch or the memoQ Web Search tool or equivalents in SDL Trados Studio, OmegaT or other tools, the placeholder goes here.

If you want the information in the supplemental (Comment) Column C, add it to the SELECT statement: SELECT A, B, C WHERE CONTAINS '<some text>'

The results table is returned in the order than the columns are named in the SELECT statement; to change the display order, change the sequence of the column labels A, B and C in the SELECT, for example:  SELECT BA, C WHERE CONTAINS '<some text>'

Query 2: searching in Column B

Yes, you guessed it: just change the column named after WHERE. So 
SELECT BA, C WHERE B CONTAINS '<some text>
for example.

Query 3: searching in Column A or Column B (bidirectional search)

For this, each comparison after the WHERE should be grouped in parentheses: 
SELECT A, B, C WHERE (A CONTAINS '<some text>') OR (B CONTAINS '<some text>')

The statement above will return results where the expression is found in either Column A or Column B. Other logic is possible: substituting AND for the logical OR in the WHERE clause returns a results table in which the expression must be present in both columns of a given record.

And yes, in memoQ Web Search or a similar tool you would use the placeholder for the expression twice. Really.

Putting it all together

To make the search URL for your Google spreadsheet three parts are needed:

  1. The base URL of the spreadsheet (look in your browser's address bar; in the address https://docs.google.com/spreadsheets/d/1Bm_ssaeF2zkUJR-mG1SaaodNSatGdvYernsE7IJcEDA/edit#gid=1106428424 for example, the base URL is everything before /edit#gid=1106428424.
  2. The string /gviz/tq?tqx=out:html&tq= and
  3. Your query statement created as described above
Just concatenate all three elements:

{base URL of the spreadsheet} + /gviz/tq?tqx=out:html&tq= + {query}

An example of this in a memoQ Web Search configuration might be:

https://docs.google.com/spreadsheets/d/1Bm_ssaeF2zkUJR-mG1SaaodNSatGdvYernsE7IJcEDA/gviz/tq?tqx=out:html&tq=SELECT B, A WHERE (A CONTAINS '{}') OR (B CONTAINS '{}')

and here you can see a search with that configuration and the characters 'muni' :  https://goo.gl/D5cQmh


Adding custom labels to the results table

If you clicked the short URL given as an example above, you'll notice that the columns are unlabeled. Try this short URL to see the same search with labels: https://goo.gl/3zJQqK

This is accomplished simply by adding LABEL A 'Portuguese', B 'English' to the end of the query string.

If you look at the URL in the address bar for any of the live web examples you'll notice that space characters, quote marks and other stuff are substituted by codes. No matter. You can type in clear text and use what you type; modern browsers can deal with stuff that is ungeeked too.

To do more formatting tricks, RTFM! It's here.



Jul 20, 2017

memoQ Web Search examples for Portuguese

This week I'm in Lisbon teaching a 24-hour Boas Practicas (best practice) evening course for translation technology with David Hardisty and Marco Neves. Tonight we're covering web search with various sites and tools, including memoQ Web Search.

Unfortunately, Kilgray provides examples of configuring the web search only for English and German, and many of the site configurations are defective. And if you have other languages as your working pairs there isn't much you can do with those examples.

In tonight's class we had students working in the following pairs:
  • Portuguese to English
  • English to Portuguese
  • Portuguese to Russian
  • French to Portuguese
  • Spanish to Portuguese
  • German to Portuguese
So we created some example configurations to do web look-ups in all these pairs. And they are available here.

I was a bit surprised to find that I never blogged the chapters of my books that dealt with configuring the web search - I'll have to get around to that one of these days - but the memoQ Help isn't bad for this if you need a little guidance on how to add more site searches or change the configurations of these.

Anyone is welcome to do with the configurations provided here as they please; I hope they will help friends, colleagues and students in the Lusophone world to go a little farther with a great tool.


Jul 3, 2017

Something new out of Africa!

Guest contribution by Obi Udeariri
Photographs provided by Sameh Ragab/EAITA

Many years ago, Pliny the Elder declaimed Ex Africa semper aliquid novi  – "(There's) always something new (coming) out of Africa". He was referring to the continent’s diverse natural resources, but that phrase has come true yet again, because something new has again come out from Africa with respect to its diverse human resources, Homo Africanus interpres.

Nairobi is the capital of Kenya and the jewel of East Africa; the stomping ground of the famed Kenyan writers Ngũgĩ wa Thiong'o and Grace Ogot and the Nobel laureate Wangari Muta Maathai. With its temperate climate and lush wildlife, it’s a favorite holiday destination for hundreds of thousands of tourists each year, who come to enjoy its excellent hospitality and numerous attractions. It’s also home to the African headquarters of the United Nations and another emerging international organization – the East African Interpreters and Translators Association.

The EAITA was formed barely a year ago, with a membership comprising language professionals from across East Africa, and in its brief life it’s already held two major events aimed at boosting professional competence, featuring outstanding keynote speakers from abroad. This year’s event was held on Saturday 1st July, was focused on the use of CAT tools to promote productivity, and was deftly and professionally handled by Sameh Ragab, a vastly experienced translation professional, CAT tools trainer, and certified United Nations Vendor, who graciously gave his audience the benefit of this extensive experience at no cost.

Technology guru Sameh Rageb of Egypt - a favorite teacher at conferences around the world!
The uptake and use of CAT tools and other cutting edge techniques and the interest in doing so is widespread. This was shown by the mini-summit nature of the event whose attendees came from all across East Africa, from Kenya itself, Rwanda, Burundi and Tanzania and from as far afield as the lush and steamy tropical nation of Nigeria. An accentologist would have had a field day.


The immense expansion of language services occasioned by new communication methods and technology has definitely not passed Africa by, contrary to what some may think. African countries have largely overcome their infrastructural issues, and language professionals are busy tapping away, chuchotant in interpreting booths, leveraging latest software for transcription, project management and other needs and are doing all this in real-time, backed up by IT infrastructure to match the best in other countries.

Translation and interpreting have always been a part of life in African countries. Given the continent's ethnically heterogeneous communities and countries, there has always been a need to convey meaning in written or oral form between its peoples, and the average language professional here (who is usually already natively bilingual in one or more of its lingua francas or native languages) is simply taking this inbuilt familiarity with language manipulation to the next level.

In view of the nearly full turnout of EAITA members and the interest generated by this event, international language service providers would do well to screw their monocles firmly in place and divert some of their flighty attention towards the continent’s language professionals. Not as a source of cheap labor, but rather in search of skilled, competent, thoroughbred professionals whose skills and expertise are on a par with anything obtainable worldwide, and whose diverse peoples speak, read, write, translate and interpret an equally diverse range of languages with proficiency including lingua francas such as Swahili, English, Arabic, French, Spanish, Hausa, Igbo and many, many more.

Congratulations to the EAITA for the successful event, which was also supported by the International Association of Professional Translators and Interpreters; I’m looking forward to more new, good things coming out of Africa!

Focused on the future.

*******

Obi Udeariry is a specialized legal translator who translates all kinds of legal documents from French, German and Dutch to English. He has a law degree and several translation certifications and has been a full-time freelance translator for 14 years. 

He is the Head of the Nigerian chapter of the International Association of Professional Translators and Interpreters (IAPTI), and lives in Lagos, Nigeria with his wife and two sons.