Showing posts with label GoogleDocs. Show all posts
Showing posts with label GoogleDocs. Show all posts

Jul 31, 2019

URL-based searches of your Google Drive


Just before a recent short holiday, I ran across an article from 2017 which described how to search Google Drive directly from Chrome's address bar. "Interesting," I thought, and with the possibility of integrating such Google Drive searches with IntelliWebSearch or memoQ's integrated web search feature (or similar features in other environments) in mind, I shared the link with a few friends.

Google Drive and its application suite, which includes GoogleDocs (the word processor) and Google Sheets (the spreadsheet application), offer many possibilities for helping in language projects, collaborative and otherwise. I have written extensively about these possibilities with terminology (here, for example, and in a number of related articles). But these earlier investigations involved specific documents and viewing these - or selected portions of them - in a web browser window. Searching a number of files of various types on one's Google Drive ("My Drive") or a subfolder thereof is a little different. Possibly more useful in some circumstances, such as in a group project where multiple participants are contributing to a shared reference folder (though this folder will have to be added to the "My Drive" of each collaborator).

Google's Help for the relevant search function explains:
You can find files in Google Drive, Docs, Sheets, and Slides by searching for:
  • File title
  • File contents
  • Items featured in pictures, PDF files, or other files stored on your Drive
You can only search for files stored in My Drive. Files stored in folders shared with you won't appear in your search unless you add the folders to My Drive.
 
You can also sort and filter search results.
It all starts with a basic URL, such as
https://drive.google.com/drive/search?q=SOMETEXT
Execute that in your browser's address bar, replacing the SOMETEXT with your desired search expression, and you'll get a hit list of all files on your Google drive which include that text in the title or contents. In a tool like memoQ Web Search, it is substituted by the placeholder for search text that the application uses (that is {} in the case of memoQ Web Search). With a little experimentation, you'll soon find the additional arguments to search specific file types or folders.

For example, if I want to do a search in the "Other" subfolder on my Google drive, I can discover the URL arguments by starting a manual search and just reading the address bar:


The parameter to use for a specific folder search is "parent", followed by a colon and the coded ID of that folder.


An example of a folder search with a specific text segment is in the screenshot above; this was taken while configuring and testing the search in a memoQ Web Search profile. One document containing the search text "turnip" was found in the folder. To view the document, right-click on it in the hit list and choose Preview.

Search inside the preview of a document found in a Google Drive search with memoQ Web Search

Unfortunately there seems to be a bug in the memoQ Web Search - which now uses Chromium - because double-clicking the document tries to open it in the old search engine based on Internet Explorer, where I was not logged in to Google.

An Internet Explorer window, bizarrely launched by the Chromium-based memoQ Web Search

In fact, you'll have to log in to Google each time you open the memoQ Web Search window (a total nuisance), so it's better to leave it open in the background, even though the current bug in which the web search window is no longer brought to the forefront can make this inconvenient. In other tools this may not be an issue.


The Chromium/IE issue as well as the focus and login hassles with memoQ's web search have been reported to memoQ Support; I look forward to seeing how these are handled. Nonetheless, this Google Drive search seems to have significant potential for individuals and teams to build searchable document collections in the folders of a Google Drive account. Try it in your working environment and share your findings!

Aug 3, 2017

"Coming to Terms" workshop materials for terminology mining



I recently put together a two-hour online workshop to teach some practical aspects of terminology mining and the creation and management of stopword lists to filter out unwanted word "noise" and get to interesting specialist terminology faster.

A recording of the talk as well as the slides and a folder of diverse resources usable with a variety of tools are available at this short URL: https://goo.gl/qvwJbf. The TVS recording file can be opened and played by the free TeamViewer application.

The discussion focuses primarily on Laurence Anthony's AntConc and the terminology extraction module of Kilgray's memoQ.

Jul 26, 2017

Shortcuts to managing bitext corpora and terminologies in free Google Sheets

When I presented various options for using spreadsheets available in the free Google Office tools suite on one's Google Drive, I was asked if there wasn't a "simpler" way to do all this.

What's simple? The answer to that depends a lot on the individual. Yes, great simplicity is possible with using the application programming interface for parameterized URL searches described in my earlier articles on this topic:
The answer is yes. However, there will be some restrictions to accept regarding your data formats and what you can do with them. If that is acceptable, keep reading and you'll find some useful "cookie cutter" options.

When I wrote the aforementioned articles, I assumed that readers unable to cope with creating their own queries would simply ask a nerdy friend for five minutes of help. But another option would be to used canned queries which match defined structures of the spreadsheet.

Let's consider the simplest cases. For anything more complicated, post questions in the comments. One can build very complex queries for a very complex glossary spreadsheet, but if that's where your at, this and other guns are for hire, no checks accepted.

You have bilingual data in Language A and Language B. These can be any two languages, even the same "language" with some twist (like a glossary of a modern standard English with 19th century thieves' cant from London). The data can be a glossary of terms, a translation memory or other bitext corpus, or even a monolingual lexicon (of special terms and their definitions or other relevant information. The fundamental requirement is that these data are placed in an online spreadsheet, which can be created online or uploaded from your local computer and that Language A be found in Column A of the spreadsheet and Language B (or the definition in a monolingual lexicon) in Column B of the spreadsheet. And to make things a little more interesting we'll designate Column C as the place for additional information.


Now let's make a list of basic queries:
  1. Search for the text you want in Column A, return matches for A as well as information in Column B and possibly C too in a table in that order
  2. Search for the text you want in Column B, return matches for B as well as information in Column A and possibly C too in a table in that order
  3. Search for the text you want in Column A or Column B, return matches for A/B and possibly C too in a table in that order

Query 1: searching in Column A

The basic query could be: SELECT A, B WHERE A CONTAINS '<some text>'
Of course <some text> is substituted by the actual text to look for enclosed in the single straight quote marks. If you are configuring a web search program like IntelliWebSearch or the memoQ Web Search tool or equivalents in SDL Trados Studio, OmegaT or other tools, the placeholder goes here.

If you want the information in the supplemental (Comment) Column C, add it to the SELECT statement: SELECT A, B, C WHERE CONTAINS '<some text>'

The results table is returned in the order than the columns are named in the SELECT statement; to change the display order, change the sequence of the column labels A, B and C in the SELECT, for example:  SELECT BA, C WHERE CONTAINS '<some text>'

Query 2: searching in Column B

Yes, you guessed it: just change the column named after WHERE. So 
SELECT BA, C WHERE B CONTAINS '<some text>
for example.

Query 3: searching in Column A or Column B (bidirectional search)

For this, each comparison after the WHERE should be grouped in parentheses: 
SELECT A, B, C WHERE (A CONTAINS '<some text>') OR (B CONTAINS '<some text>')

The statement above will return results where the expression is found in either Column A or Column B. Other logic is possible: substituting AND for the logical OR in the WHERE clause returns a results table in which the expression must be present in both columns of a given record.

And yes, in memoQ Web Search or a similar tool you would use the placeholder for the expression twice. Really.

Putting it all together

To make the search URL for your Google spreadsheet three parts are needed:

  1. The base URL of the spreadsheet (look in your browser's address bar; in the address https://docs.google.com/spreadsheets/d/1Bm_ssaeF2zkUJR-mG1SaaodNSatGdvYernsE7IJcEDA/edit#gid=1106428424 for example, the base URL is everything before /edit#gid=1106428424.
  2. The string /gviz/tq?tqx=out:html&tq= and
  3. Your query statement created as described above
Just concatenate all three elements:

{base URL of the spreadsheet} + /gviz/tq?tqx=out:html&tq= + {query}

An example of this in a memoQ Web Search configuration might be:

https://docs.google.com/spreadsheets/d/1Bm_ssaeF2zkUJR-mG1SaaodNSatGdvYernsE7IJcEDA/gviz/tq?tqx=out:html&tq=SELECT B, A WHERE (A CONTAINS '{}') OR (B CONTAINS '{}')

and here you can see a search with that configuration and the characters 'muni' :  https://goo.gl/D5cQmh


Adding custom labels to the results table

If you clicked the short URL given as an example above, you'll notice that the columns are unlabeled. Try this short URL to see the same search with labels: https://goo.gl/3zJQqK

This is accomplished simply by adding LABEL A 'Portuguese', B 'English' to the end of the query string.

If you look at the URL in the address bar for any of the live web examples you'll notice that space characters, quote marks and other stuff are substituted by codes. No matter. You can type in clear text and use what you type; modern browsers can deal with stuff that is ungeeked too.

To do more formatting tricks, RTFM! It's here.



Jun 6, 2017

Build your own online reference TM for a team or anyone!


In the past, I have published several articles describing the use of free Google Sheets as a means of providing searchable glossaries on the Internet. This concept has continued to evolve, with current efforts focused on the use of forms and Google's spreadsheet service API to provide even more free, useful functionality.

On a number of occasions I have also mentioned that the same approaches can be used for translation memories to be shared with people having different translation environments, including those working with no CAT tools at all. However, the path to get there with a TM might not be obvious to everyone, and the effort of finding good tools to handle the necessary data conversions can be frustrating.

I've put up a demonstration TM in Portuguese and English here: https://goo.gl/LXXgmf

Here is a selection from the same data collection, selecting for matches of the Portuguese word 'cachorro':  https://goo.gl/9KJils
This uses the same parameterized URL search technique described in my article on searchable glossaries.

A translation memory in a Google Sheet has a few advantages:
  • It can be made accessible to anyone or to a selected group (using Google's permission scheme)
  • It can be downloaded in many formats for adding to a TM or other reference source on a local computer
  • Hits can also be read in context if the TM content is in the order it occurs in the translated documents. This is an advantage currently offered in commercial translation environment tools only by memoQ LiveDocs corpora.
Web search tools of many kinds can be configured easily to find data in these online Google Sheet "translation memories" - SDL Trados Studio, OmegaT and memoQ are among those tools with such facilities integrated, and IntelliWebSearch can bridge the gap for any environment that lacks such a thing.

But... how do you go from a translation memory in a CAT tool to the same content in a Google Sheet? This can be confusing, because many tools do not offer an option to export a TM to a spreadsheet or delimited text file. Some suggestions are found in an old PrAdZ thread, but I found a more satisfactory way of dealing with the problem.

A few years ago, the Heartsome Translation Studio went free and Open Source. It contains some excellent conversion tools. I downloaded a copy of the Heartsome TMX Editor (the available installers for Windows, Mac and Linux are here) and used it to convert my TMX file.




The result was then uploaded to a public directory on my personal Google Drive, and the URL was noted for building queries. Fairly straightforward.

The Heartsome TMX Editor seems like it might be a useful tool to replace Olifant as my TMX editor. While the TM editor in my tool of choice (memoQ) has improved in recent years, it still does not do many things I require, and some of this functionality is available in Heartsome.

Jun 5, 2017

Technology for Legal Translation

Last April I was a guest at the Buenos Aires University Facultad de Derecho, where I had an opportunity to meet students and staff from the law school's integrated degree program for certified public translators and to speak about my use of various technologies to assist my work in legal translation. This post is based loosely on that presentation and a subsequent workshop at the Universidade de Évora.

Useful ideas seldom develop in isolation, and to the extent that I can claim good practice in the use of assistive technologies for my translation work in legal and other domains it is largely the product of my interactions with many colleagues over the past seventeen years of commercial translation activity. These fine people have served as mentors, giving me my first exposure to the concepts of platform interoperability for translation tools, and as inspirations by sharing the many challenges they face in their work and clearly articulating the desired outcomes they hoped to achieve as professionals. They have also generously and frequently shared with me the solutions that they have found and have often unselfishly shared their ideas on how and why we should do better in our daily practice. And I am grateful that I can continue to learn with them, work better, and help others to do so as well.

A variety of tools for information management and transformation can benefit the work of a legal translator in areas which include but are not limited to:
  • corpus utilization,
  • text conversion,
  • terminology management,
  • diverse information retrieval,
  • assisted drafting,
  • dictated speech to text,
  • quality assurance,
  • version control and comparison, and
  • source and target text review.
Though not exhaustive, the list above can provide a fairly comprehensive basis for education of future colleagues and continued professional development for those already active as legal translators. But with any of the technologies discussed below, it is important to remember that the driving force is not the hardware and software we use in technical devices but rather the human mind and its understanding of subject matter and the needs of the particular task or work process in the legal domain. No matter how great our experience, there is always something more and useful to be learned, and often the best way to do this is to discuss the challenges of technology and workflow with others and keep an open mind for new approaches with promise.


Reference texts of many kinds are important in legal translation work (and in other types of translation too, of course). These may be monolingual or multilingual texts, and they provide a wealth of information on subject matter, terminology and typical usage in particular contexts. These collections of text – or corpora – are most useful when the information found in them can be read in context rather than isolation. Translation memories – used by many in our work – are also corpora of a kind, but they are seriously flawed in their usual implementations, because only short segments of text are displayed in a bilingual format, and the meaning and context of these retrieved snippets are too often obscure.

An excerpt from a parallel corpus showing a treaty text in English, Portuguese and Spanish

The best corpus tools for translation work allow concordance searches in multiple selected corpora and provide access to the full context of the information found. Currently, the best example of integrated document context with information searches in a translation environment tool is found in the LiveDocs module of Kilgray's memoQ.

A memoQ concordance search with a link to an "aligned" translation
A past translation and its preview stored in a memoQ LiveDocs corpus, accessed via concordance search
A memoQ LiveDocs corpus has all the advantages of the familiar "translation memory" but can include other information, such as previews of the translated work as well. It is always clear in which document the information "hit" was found, and corpora can also include any number of monolingual documents in source and target languages, something which is not possible with a traditional translation memory.

In many cases, however, much context can be restored to a traditional translation memory by transforming it into a "document" in a LiveDocs corpus. This is because in most cases, substantial portions of the translation memory will have its individual segment records stored in document order; if the content is exported as a TMX file or tab-delimited text file and then imported as a bilingual document in a LiveDocs corpus, the result will be almost as if the original translations had been aligned and saved, and from a concordance hit one can open the bilingual content directly and read the parts before and after the text found in the concordance search.


Legal translation can involve text conversion in a broad sense in many ways. Legal translators must often deal with hardcopy or faxed material or scanned files created from these. Often documents to translate and reference documents are provided in portable document format (PDF), in which finding and editing information can be difficult. Using special software, these texts can be converted into documents which can be edited, and portions can be copied, pasted and overwritten easily, or they can be imported in translation assistance platforms such as SDL Trados Studio, Wordfast or memoQ. (Some of these environments include integrated facilities for converting PDF texts, but the results are seldom as suitable for work as PDF or scanned files converted with optical character recognition software such as ABBYY FineReader or OmniPage.)


Software tools like ABBYY FineReader can also convert "dead" scanned text images into searchable documents. This will even work with bad contrast or color images in the background, making it easier, for example, to look for information in mountains of scanned documents used in legal discovery. Text-on-image files like the example shown above completely preserve the layout and image context of the text to be read in the best way. I first discovered and used this option while writing a report for a client in which I had to reference sections of a very long, scanned policy document from the European Parliament. It was driving me crazy to page through the scanned document to find information I wanted to cite but where I had failed to make notes during my first reading. Converting that scanned policy to a searchable PDF made it easy to find what I needed in seconds and accurately cite its page number, etc. Where there is text on pictures, difficult contrast and other features this is often far better for reference purposes than converting to an MS Word document, for example, where the layouts are likely to become garbled.


Software tools for translation can also make text in many other original formats accessible to translators in an ergonomically simpler form, also ensuring, where necessary, that no text is overlooked because of a complicated layout or because it is in an easily overlooked footnote or margin note. Text import filters in translation environments make it easy to read and translate the words in a uniform working environment, with many reference tools and other help available, and then render the translated text back into its original format or some more useful bilingual format.

An excerpt of translated patent claims exported as a bilingual table for review

Technology also offers many possibilities for identifying, recording and controlling relevant terminology in legal translation work.


Large quantities of text can be analyzed quickly to find the most frequent special vocabulary likely to be relevant to the translation work and save these in project glossaries, often enabling that work to be organized better with much of the clarification of terms taking place prior to translation.  This is particularly valuable in large projects where it may be advisable to ensure that a team of translators all use the same terms in the target language to avoid possible confusion and misunderstanding.

Glossaries created in translation assistance tools can provide terminology hints during work and even save keystrokes when linked to predictive, "intelligent" writing features.


Integrated quality checking features in translation environments enable possible deviations of terminology or other issues to be identified and corrected quickly.


Technical features in working software for translation allow not only desirable terms to be identified and elaborated; they also enable undesired terms to be recorded and avoided. Barred terms can be marked as such while translating or automatically identified in a quality check.

A patent glossary exported from memoQ and then made into a PDF dictionary via SDL Trados MultiTerm
Technical tools enable terminology to be shared in many different ways. Glossaries in appropriate formats can be moved easily between different environments to share them with others on a team which uses diverse technologies; they can also be output as spreadsheets, web pages or even formatted dictionaries (as shown in the example above). This can help to ensure consistency over time in the terms used by translators and attorneys involved in a particular case.

There are also many different ways that terminology can be shared dynamically in a team. Various terminology servers available usually suffer from being restricted to particular platforms, but freely available tools like Google Sheets coupled with web look-up interfaces and linked spreadsheets customized for importing into particular environments can be set up quickly and easily, with access restricted to a selected team.


The links in the screenshot above show a simple example using some data from SAP. There is a master spreadsheet where the data is maintained and several "slavesheets" designed for simple importing into particular translation environment tools. Forms can also be used for simplified data entry and maintenance.


If Google Sheets do not meet the confidentiality requirements of a particular situation, similar solutions can be designed using intranets, extranets, VPNs, etc.


Technical tools for translators can help to locate information in a great variety of environments and media in ways that usually integrate smoothly with their workflow. Some available tools enable glossaries and bilingual corpora to be accessed in any application, including word processors, presentation software and web pages.


Corpus information in translation memories, memoQ LiveDocs or external sources can be looked up automatically or in concordance searches based on whole or partial content matches or specified search terms, and then useful parts can be inserted into the target text to assist translation. In some cases, differences between a current source text and archived information is highlighted to assist in identifying and incorporating changes.


Structured information such as dates, currency expressions, legal citations and bibliographical references can also be prepared for simple keystroke insertion in the translated text or automated quality checking. This can save many frustrating hours of typing and copy revision. In this regard, memoQ currently offers the best options for translation with its "auto-translation" rulesets, but many tools offer rules-based QA facilities for checking structured information.


Voice recognition technologies offer ergonomically superior options for transcription in many languages and can often enable heavy translation workloads with short deadlines to be handled with greater ease, maintaining or even improving text quality. Experienced translators with good subject matter knowledge and voice recognition software skills can typically produce more finished text in a day than the best post-editing operations for machine pseudo-translation, with the exception that the text produced by human voice transcription is actually usable in most situations, while the "gloss" added to machine "translations" is at best lipstick on a pig.


Reviewing a text for errors is hard work, and a pressing deadline to file a brief doesn't make the job easier. Technical tools for translation enable tens of thousands of words of text to be scanned for particular errors in seconds or minutes, ensuring that dates and references are correct and consistent, that correct terminology has been used, et cetera.

The best tools even offer sophisticated tools for tracking changes, differences in source and target text versions, even historical revisions to a translation at the sentence level. And tools like SDL Trados Studio or memoQ enable a translation and its reference corpora to be updated quickly and easily by importing a modified (monolingual) target text.

When time is short and new versions of a source text may follow in quick succession, technology offers possibilities to identify differences quickly, automatically process the parts which remain unchanged and keep everything on track and on schedule.


For all its myriad features, good translation technology cannot replace human knowledge of language and subject matter. Those claiming the contrary are either ignorant or often have a Trumpian disregard for the truth and common sense and are all too eager to relieve their victims of the burdens of excess cash without giving the expected value in exchange.

Technologies which do not assist translation experts to work more efficiently or with less stress in the wide range of challenges found in legal translation work are largely useless. This really does include machine pseudo-translation (MpT). The best “parts” of that swindle are essentially the corpus matching for translation memory archives and corpora found in CAT tools like memoQ or SDL Trados Studio, and what is added is often incorrect and dangerously liable to lead to errors and misinterpretations. There are also documented, damaging effects on one’s use of language when exposed to machine pseudo-translation for extended periods.

Legal translation professionals today can benefit in many ways from technology to work better and faster, but the basis for this remains what it was ten, twenty, forty or a hundred years ago: language skill and an understanding of the law and legal procedure. And a good, sound, well-rested mind.

*******

Further references

Speech recognition 

Dragon NaturallySpeaking: https://www.nuance.com/dragon.html
Tiago Neto on applications: https://tiagoneto.com/tag/speech-recognition
Translation Tribulations – free mobile for many languages: http://www.translationtribulations.com/2015/04/free-good-quality-speech-recognition.html
Circuit Magazine - The Speech Recognition Revolution: http://www.circuitmagazine.org/chroniques-128/des-techniques
The Chronicle - Speech Recognition to Go: http://www.atanet.org/chronicle-online/highlights/speech-recognition-to-go/
The Chronicle - Speech Recognition Is in Your Back Pocket (or Wherever You Keep Your Mobile Phone): http://www.atanet.org/chronicle-online/none/speech-recognition-is-in-your-back-pocket-or-wherever-you-keep-your-mobile-phone/

Document indexing, search tools and techniques

Archivarius 3000: http://www.likasoft.com/document-search/
Copernic Desktop Search: https://www.copernic.com/en/products/desktop-search/
AntConc concordance: http://www.laurenceanthony.net/software/antconc/
Multiple, separate concordances with memoQ: http://www.translationtribulations.com/2014/01/multiple-separate-concordances-with.html
memoQ TM Search Tool: http://www.translationtribulations.com/2014/01/the-memoq-tm-search-tool.html
memoQ web search for images: http://www.translationtribulations.com/2016/12/getting-picture-with-automated-web.html
Upgrading translation memories for document context: http://www.translationtribulations.com/2015/08/upgrading-translation-memories-for.html
Free shareable, searchable glossaries with Google Sheets: http://www.translationtribulations.com/2016/12/free-shareable-searchable-glossaries.html

Auto-translation rules for formatted text (dates, citations, etc.)

Translation Tribulations, various articles on specifications, dealing with abbreviations & more:
http://www.translationtribulations.com/search/label/autotranslatables
Marek Pawelec, regular expressions in memoQ: http://wasaty.pl/blog/2012/05/17/regular-expressions-in-memoq/

Authoring original texts in CAT tools

Translation Tribulations: http://www.translationtribulations.com/2015/02/cat-tools-re-imagined-approach-to.html

Autocorrection for typing in memoQ

Translation Tribulations: http://www.translationtribulations.com/2014/01/memoq-autocorrect-update-ms-word-export.html

Dec 27, 2016

Free shareable, searchable glossaries for collaboration with anyone

Some years ago I suggested a procedure using Google spreadsheets for glossary collaboration in projects. Many people do this sort of thing now.

What I do not think most are doing, however, is accessing these web-based term lists efficiently as terminology resources in their work. It's hard to compete with the efficiency of integrated termbases, TMs, web search features, etc.

... unless of course you integrate a web search for those online spreadsheets which returns just the few data of interest.

Matches found for German "ladepresse" in a glossary of a few thousand hunting terms
This is fairly straightforward using Google's visualization API with a simple query. A parameterized URL can be built to perform custom searches of your own data or data shared by colleagues or clients. "Canned" queries can be easily incorporated in custom searches from many tools, including memoQ Web Search, IntelliWebSearch and others.


Building a custom search URL for your Google spreadsheet is fairly simple. In the example above it consists of three parts:

{base URL of the spreadsheet} + /gviz/tq?tqx=out:html&tq= + {query}

The red bit invokes the Google visualization API and specifies that the query results be returned as HTML (for display in a browser). The query language is similar to SQL, but if you use a prepared query for a given spreadsheet table structure, you don't need to learn any of that. Queries can be made which also return definitions, images, context examples or anything else that might reside in columns of interest in the online spreadsheet.

Using a tool like IntelliWebSearch or integrated extensions of OmegaT, memoQ and other tools, users working with any sort of tools can share a live glossary. Google Spreadsheets also have some permissions/security features which can be investigated if needed.

Of course other data can be shared this way, including TMs or XLIFF data as well as monolingual information. A little study of the relevant Google documentation reveals many possibilities :-)

Dec 7, 2012

Terminology collaboration with Google Docs: new twists

A few years ago, I put a notice in this blog about a colleague's interesting use of Google Docs to share terminology with faraway colleagues in a project. Earlier this year I enjoyed a similar collaboration with a Google Docs spreadsheet used to exchange and update terminology on a very time-critical annual report with translators using two different versions of Trados, memoQ and no CAT tools at all.

Sharing information via Google Docs was quite easy, and we were able to configure the access rights without a lot of trouble. But at the time I still had a bit of extra, annoying effort to get the data imported into my working environment for frequent updates.

Tonight another colleague contacted me with basically the same problem. Her client manages data in an Excel spreadsheet, which gets updated and sent out frequently. She already had the idea that this might work better in Google Docs, and I agreed.

But I kept thinking about that annoying update problem....

One can, of course, export Google Docs spreadsheet data in various formats:


I've marked a few of the export ("download") formats which are probably useful for a subsequent import into a translation environment too. But the downloaded data still won't be in the "perfect" format in many cases, and there will be extra steps involved in matching it up to the fields in your term base.

One way to simplify this problem is to create another online spreadsheet in Google Docs and link it to the original, shared spreadsheet. In this second spreadsheet, which is your "personal" copy for use in your favorite tool, you reformat the data so they will export in a form that makes your later import to your tool's termbase easier.

In my case, I use memoQ, so I created a Google Docs spreadsheet with the first row containing the default field names of interest from the CSV export of my memoQ termbase:

I linked the columns in my personal online spreadsheet with the shared spreadsheet using the ImportRange command. It has two arguments, both of which have to enclosed in quotes. The first one (argument #1 above) is the key for the online spreadsheet to be referenced; it is shown in the URL of the online spreadsheet (just look in the address bar of your browser and you will see it). The second one specifies the sheet and the range of cells to copy. I put this formula in one cell and it copied the entire column for me.

I could, if I wanted to, use conditional (IF) statements and other tricks to transform some data in columns of the other sheet and build the semicolon-delimited term properties list (Term_Info) that memoQ uses to keep track of gender, capitalization enforcement, forbidden status, etc. But none of that is needed for simple sharing of terms, definitions and examples for instance.

I simply export my personal Google Docs spreadshit as CSV, then import it into my desired termbase in memoQ. If I have IDs set for the term entries in the online spreadsheet, I could even choose ID-based updates of my local termbase when I do the import.

Those who use other tools, such as Trados, OmegaT or WordFast can set up their spreadsheets and do exports as best suits their needs.

This approach enables you to take source data in nearly any format in an online spreadsheet and rework it for the greatest convenience in the tool of your choice. Although not a "perfect" solution, it is perhaps a convenient one until better resources are commonly available for dynamic, cross-platform translation collaboration.

So what do I recommend my friend to try as a first step? Maybe take the client's latest spreadsheet, copy and paste it into Google Docs and share it with the client and others on the team. Then it's already "up there" for everyone's convenience (local XLSX copies can be downloaded any time), and she can get on with creating a convenient "view" of this shared data in her personal spreadsheet, which can be exported for local use any time. That personal sheet could also be shared (read only access recommended) with other team members using the same translation environment tool.

Jul 18, 2009

Real-time terminology sharing with Google Docs (reBlog)

Here's an interesting idea I just encountered on the blog of an esteemed colleague:
We took the opportunity to test the Google Docs spreadsheet as a remote real-time shared terminology tool. We wanted to know if it would allow us to open our glossary at the same time to check terminology and add, change, and delete entries.Riccardo, About Translation: Real-time terminology sharing with Google Docs, Jul 2009
You should read the whole article!