Showing posts with label spreadsheets. Show all posts
Showing posts with label spreadsheets. Show all posts

Jul 26, 2017

Shortcuts to managing bitext corpora and terminologies in free Google Sheets

When I presented various options for using spreadsheets available in the free Google Office tools suite on one's Google Drive, I was asked if there wasn't a "simpler" way to do all this.

What's simple? The answer to that depends a lot on the individual. Yes, great simplicity is possible with using the application programming interface for parameterized URL searches described in my earlier articles on this topic:
The answer is yes. However, there will be some restrictions to accept regarding your data formats and what you can do with them. If that is acceptable, keep reading and you'll find some useful "cookie cutter" options.

When I wrote the aforementioned articles, I assumed that readers unable to cope with creating their own queries would simply ask a nerdy friend for five minutes of help. But another option would be to used canned queries which match defined structures of the spreadsheet.

Let's consider the simplest cases. For anything more complicated, post questions in the comments. One can build very complex queries for a very complex glossary spreadsheet, but if that's where your at, this and other guns are for hire, no checks accepted.

You have bilingual data in Language A and Language B. These can be any two languages, even the same "language" with some twist (like a glossary of a modern standard English with 19th century thieves' cant from London). The data can be a glossary of terms, a translation memory or other bitext corpus, or even a monolingual lexicon (of special terms and their definitions or other relevant information. The fundamental requirement is that these data are placed in an online spreadsheet, which can be created online or uploaded from your local computer and that Language A be found in Column A of the spreadsheet and Language B (or the definition in a monolingual lexicon) in Column B of the spreadsheet. And to make things a little more interesting we'll designate Column C as the place for additional information.


Now let's make a list of basic queries:
  1. Search for the text you want in Column A, return matches for A as well as information in Column B and possibly C too in a table in that order
  2. Search for the text you want in Column B, return matches for B as well as information in Column A and possibly C too in a table in that order
  3. Search for the text you want in Column A or Column B, return matches for A/B and possibly C too in a table in that order

Query 1: searching in Column A

The basic query could be: SELECT A, B WHERE A CONTAINS '<some text>'
Of course <some text> is substituted by the actual text to look for enclosed in the single straight quote marks. If you are configuring a web search program like IntelliWebSearch or the memoQ Web Search tool or equivalents in SDL Trados Studio, OmegaT or other tools, the placeholder goes here.

If you want the information in the supplemental (Comment) Column C, add it to the SELECT statement: SELECT A, B, C WHERE CONTAINS '<some text>'

The results table is returned in the order than the columns are named in the SELECT statement; to change the display order, change the sequence of the column labels A, B and C in the SELECT, for example:  SELECT BA, C WHERE CONTAINS '<some text>'

Query 2: searching in Column B

Yes, you guessed it: just change the column named after WHERE. So 
SELECT BA, C WHERE B CONTAINS '<some text>
for example.

Query 3: searching in Column A or Column B (bidirectional search)

For this, each comparison after the WHERE should be grouped in parentheses: 
SELECT A, B, C WHERE (A CONTAINS '<some text>') OR (B CONTAINS '<some text>')

The statement above will return results where the expression is found in either Column A or Column B. Other logic is possible: substituting AND for the logical OR in the WHERE clause returns a results table in which the expression must be present in both columns of a given record.

And yes, in memoQ Web Search or a similar tool you would use the placeholder for the expression twice. Really.

Putting it all together

To make the search URL for your Google spreadsheet three parts are needed:

  1. The base URL of the spreadsheet (look in your browser's address bar; in the address https://docs.google.com/spreadsheets/d/1Bm_ssaeF2zkUJR-mG1SaaodNSatGdvYernsE7IJcEDA/edit#gid=1106428424 for example, the base URL is everything before /edit#gid=1106428424.
  2. The string /gviz/tq?tqx=out:html&tq= and
  3. Your query statement created as described above
Just concatenate all three elements:

{base URL of the spreadsheet} + /gviz/tq?tqx=out:html&tq= + {query}

An example of this in a memoQ Web Search configuration might be:

https://docs.google.com/spreadsheets/d/1Bm_ssaeF2zkUJR-mG1SaaodNSatGdvYernsE7IJcEDA/gviz/tq?tqx=out:html&tq=SELECT B, A WHERE (A CONTAINS '{}') OR (B CONTAINS '{}')

and here you can see a search with that configuration and the characters 'muni' :  https://goo.gl/D5cQmh


Adding custom labels to the results table

If you clicked the short URL given as an example above, you'll notice that the columns are unlabeled. Try this short URL to see the same search with labels: https://goo.gl/3zJQqK

This is accomplished simply by adding LABEL A 'Portuguese', B 'English' to the end of the query string.

If you look at the URL in the address bar for any of the live web examples you'll notice that space characters, quote marks and other stuff are substituted by codes. No matter. You can type in clear text and use what you type; modern browsers can deal with stuff that is ungeeked too.

To do more formatting tricks, RTFM! It's here.



Jun 6, 2017

Build your own online reference TM for a team or anyone!


In the past, I have published several articles describing the use of free Google Sheets as a means of providing searchable glossaries on the Internet. This concept has continued to evolve, with current efforts focused on the use of forms and Google's spreadsheet service API to provide even more free, useful functionality.

On a number of occasions I have also mentioned that the same approaches can be used for translation memories to be shared with people having different translation environments, including those working with no CAT tools at all. However, the path to get there with a TM might not be obvious to everyone, and the effort of finding good tools to handle the necessary data conversions can be frustrating.

I've put up a demonstration TM in Portuguese and English here: https://goo.gl/LXXgmf

Here is a selection from the same data collection, selecting for matches of the Portuguese word 'cachorro':  https://goo.gl/9KJils
This uses the same parameterized URL search technique described in my article on searchable glossaries.

A translation memory in a Google Sheet has a few advantages:
  • It can be made accessible to anyone or to a selected group (using Google's permission scheme)
  • It can be downloaded in many formats for adding to a TM or other reference source on a local computer
  • Hits can also be read in context if the TM content is in the order it occurs in the translated documents. This is an advantage currently offered in commercial translation environment tools only by memoQ LiveDocs corpora.
Web search tools of many kinds can be configured easily to find data in these online Google Sheet "translation memories" - SDL Trados Studio, OmegaT and memoQ are among those tools with such facilities integrated, and IntelliWebSearch can bridge the gap for any environment that lacks such a thing.

But... how do you go from a translation memory in a CAT tool to the same content in a Google Sheet? This can be confusing, because many tools do not offer an option to export a TM to a spreadsheet or delimited text file. Some suggestions are found in an old PrAdZ thread, but I found a more satisfactory way of dealing with the problem.

A few years ago, the Heartsome Translation Studio went free and Open Source. It contains some excellent conversion tools. I downloaded a copy of the Heartsome TMX Editor (the available installers for Windows, Mac and Linux are here) and used it to convert my TMX file.




The result was then uploaded to a public directory on my personal Google Drive, and the URL was noted for building queries. Fairly straightforward.

The Heartsome TMX Editor seems like it might be a useful tool to replace Olifant as my TMX editor. While the TM editor in my tool of choice (memoQ) has improved in recent years, it still does not do many things I require, and some of this functionality is available in Heartsome.

Dec 27, 2016

Free shareable, searchable glossaries for collaboration with anyone

Some years ago I suggested a procedure using Google spreadsheets for glossary collaboration in projects. Many people do this sort of thing now.

What I do not think most are doing, however, is accessing these web-based term lists efficiently as terminology resources in their work. It's hard to compete with the efficiency of integrated termbases, TMs, web search features, etc.

... unless of course you integrate a web search for those online spreadsheets which returns just the few data of interest.

Matches found for German "ladepresse" in a glossary of a few thousand hunting terms
This is fairly straightforward using Google's visualization API with a simple query. A parameterized URL can be built to perform custom searches of your own data or data shared by colleagues or clients. "Canned" queries can be easily incorporated in custom searches from many tools, including memoQ Web Search, IntelliWebSearch and others.


Building a custom search URL for your Google spreadsheet is fairly simple. In the example above it consists of three parts:

{base URL of the spreadsheet} + /gviz/tq?tqx=out:html&tq= + {query}

The red bit invokes the Google visualization API and specifies that the query results be returned as HTML (for display in a browser). The query language is similar to SQL, but if you use a prepared query for a given spreadsheet table structure, you don't need to learn any of that. Queries can be made which also return definitions, images, context examples or anything else that might reside in columns of interest in the online spreadsheet.

Using a tool like IntelliWebSearch or integrated extensions of OmegaT, memoQ and other tools, users working with any sort of tools can share a live glossary. Google Spreadsheets also have some permissions/security features which can be investigated if needed.

Of course other data can be shared this way, including TMs or XLIFF data as well as monolingual information. A little study of the relevant Google documentation reveals many possibilities :-)