I've also been looking to optimize the procedure for migrating the Microsoft Word autocorrection lists to memoQ. There are a number of problems with using the table-generating macro that Kilgray suggests in the knowledgebase article on using MS Word 2003 autocorrect data; when I created a 17,000 entry list from a large AutoCorrect file for one language, it was nearly impossible to do anything with it because of memory problems. The following macro, which could be put into the Normal template in MS Word, should be a little easier to work with:
Sub BuildAutoCorrectList()Invoke the macros dialog in MS Word with Alt+F8. Select the Normal.dot or Normal.dotm file (depending on your version of MS Office) from the dropdown list, enter the name of the new macro and click the Create button. Then paste in the code above. When the macro is run, it will create a new document with the autocorrection list in tab-delimited text. To bring the list into memoQ, you'll have to
Dim ACE As AutoCorrectEntry
' Create new document.
' Iterate through AutoCorrect entries.
For Each ACE In Application.AutoCorrect.Entries
' Insert each entry name and its value on a new line.
Selection.TypeText ACE.Name & vbTab & ACE.Value & vbCr
- Paste in the XML header needed by the "light resource" for AutoCorrect lists in memoQ. You can see what this looks like for the language setting you want by creating a dummy resource, exporting it and opening the file with a text editor. European Spanish might look like this, for example:
<MemoQResource ResourceType="AutoCorrect" Version="1.0">
<FileName>spa-ES#EU Spanish AutoCorrect.mqres</FileName>
- Save the file as plain text with UTF-8 encoding.
- Change the file extension to "*.mqres"
- Import the resource to memoQ.
Other sources for autocorrection data
With a bit of searching, one can find other sources of data to add to AutoCorrect resources for various language. Wikipedia, for example, offers lists of commonly misspelled words, such as this one in English, which includes links to Dutch, Hungarian, Portuguese, Spanish and Turkish lists. The structure of the data lends itself easily to reformatting with the search and replace features of a text editor:
alamanya->almanyaCopy the data from the Wikipedia page to a text file. Then use search and replace to substitute tabs for the "->" structures, add an appropriate XML header for the memoQ resource and save the file as UTF-8 with an MQRES extension and you have an AutoCorrect list ready for import to memoQ. An example of the Turkish list converted and ready for use in memoQ is available for download here.
For German, there is a list of common spelling errors on Wikipedia which can be adapted with very little effort to make this resource.
The English list on the Oxford Dictionaries page can also be adapted without much ado. And there are many others to be found on the Internet.
Merging memoQ AutoCorrect resources
Entries from multiple AutoCorrect lists can be combined in a single tab-delimited file, and duplicates can be removed using Microsoft Excel, for example.
The reason Column B must not be selected is that it contains the desired text after correction, and there may be more than one error entry for a particular word.
After duplicates have been removed from the list, save the file as Unicode text, then import it to memoQ. A similar procedure with Excel may be followed to maintain other memoQ light resources; I do this rather frequently for segmentation exceptions to ensure that the lists for the different language variants I work with remain synchronized. (It would be nice, of course, if Kilgray would create a reasonable light resource manager with such capabilities. It gets tiring to do this so often with stopword lists and other resources.)