Mar 4, 2017

Documenting auto-translation rule development for memoQ

In an recent article, I described my simple method of recording examples of structured information like dates, financial expressions or legal references to help developers plan auto-translation rules (or other features using regular expressions, such as Regex Tagger rules) in memoQ and other applications. These are a sort of simplified performance specification - a table of examples showing how the rules should "perform", what they should do: what patterned source language expressions are to be transformed into particular structured expressions in the target language.

The need for proper documentation of such efforts does not end there, however. It is very important, especially for more complex sets of rules, that there be clear documentation of the purpose and logic of the rules developed, and that this documentation be present
  • in the rules themselves (as comments) and
  • in external documents to be used as references for troubleshooting, maintenance and further development.
Auto-translation rules and other resources using regular expressions should not be scripted and maintained for the long run in memoQ itself or in any other environment which does not allow thorough commenting of the regular expressions used. Without comments, it is simply too easy to destroy functioning rules by forgetting why they were written a certain way once-upon-a-time, and an environment able to use comments also allows old rules to be "commented out" (disabled, but still available for reference or later re-use) while new versions are tested. That is basically impossible with memoQ's internal resource editors at the present time. And to make matters worse, if auto-translation rules are edited inside memoQ, their order changes, sometimes with dire consequences if functionality depends on the rule order. Try sorting out problems like that in a set of 70 or so rules.

Excerpt from a large set of currency format rules with extensive comments. These comments are stripped when
the rules are imported into memoQ, so all maintenance should be done externally in a tool like
Notepad++.
As I began to revise and improve old rules that I created years ago for dates and currency expressions, I found that it was helpful to create a record of what changes I had made - and why I made them - and keep this information in a tabular form for easy reference and re-use.
Click to access a PDF sample of my rule development record (2 pages)
The graphic above is one example of how I maintain my personal records of some work developing regular expressions. I usually include
  • descriptions of all information recorded
  • a specific example on which I will base the general rule
  • a simple ("fragile") version of the rule part (source input and target output) with only the most essential elements; this is not error-tolerant, but it is the easiest to understand and the first place to look if something isn't working as I would like it to
  • more robust variations which take into account differences in spacing, punctuation, etc. or include things like non-breaking spaces that might be desired in the output (this can get cluttered and hard to read)
  • color-marking for easier identification of some elements
  • comments about why things are written as they are or about possible improvements or problems
This record is a template of sorts from which rules can be assembled very quickly or rules can be re-purposed for other languages or formats in a way that is easy to follow and catch mistakes. Such records are also helpful if the rules are to be shared with other developers or maintained by someone else.

My example is certainly not the final word in project documentation for such efforts; it is simply part of a set of personal tools to help me work more efficiently with the limited time I have. Professional development and consulting organizations often have far more extensive and detailed systems of project documentation; when I was part of one such shop nearly 20 years ago, my (downloadable) 2-page example might easily have filled twenty pages of very important-looking professional technobabble. Life's too short for shit like that anymore.

But if you value your time as a developer or your investment as one who hires others to develop such useful rules, it pays big dividends in most cases to demand some sort of clear, systematic and accurate record of how your special rules, filters, etc. were developed so that they can be maintained and improved in the future.

3 comments:

  1. I highly recommend RegexBuddy as an excellent tool to craft regex, using wizard and well commented rules library, featuring color coding. And I fully agree with you on benefit of inserting comments in the rule itself.

    ReplyDelete
    Replies
    1. I think that is a favorite resource for Paul Filkin too. Paul has written a lot of nice regex tutorials (like this one: https://multifarious.filkin.com/2012/08/24/regex-pt1/) and he recommends it too.

      Delete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)