Pages

Aug 4, 2012

Coping with embedded "BIN" objects in MS Office documents


When I published a procedure for getting at embedded objects in Microsoft Office documents, I mentioned that older documents in MS Office 2003 formats could be saved as Office 2007/2010 equivalents in order to access the embedded objects via Windows Explorer after renaming the extension to ZIP. What I failed to mention is that older format embedded objects are stored with a BIN extension, not the proper extension of the application with which they are associated. The icon above, for example, is for an embedded PowerPoint 2003 slide.

There are a few ways of dealing with this. If you know what the object should be, just re-name the extension to fit (PPT in this case). Or if you are importing to a CAT tool, specify the proper filter for the *.bin file. Here's an example for memoQ 6:


The number at the end of the file name before the extension indicates the order of the objects in the document, which may be helpful in identifying the new extension to use. If you want to put the translated objects back in the embeddings folder, remember to change the extensions of the older objects back to BIN.

6 comments:

  1. Instead of changing the extension, you can just add the proper extension.
    For example, instead of oleObject1.ppt, use oleObject1.bin.ppt.
    This way, you remember easily the correct extension.

    ReplyDelete
    Replies
    1. Thanks Maxime for the additional input. Indeed, very relevant post for me!

      Delete
  2. Good point, Maxime - that one occurred to me later & I forgot to add it. There are other little issues with the objects as well. Some CAT tools do not automatically recognize the extension for an embedded PowerPoint slide in the newer formats, so one has to go through a few extra, silly steps to select these files and choose a filter, but I am optimistic that this will be sorted out soon.

    ReplyDelete
  3. I want to preserve a file inside of my word (docx) for documentation purposes, but it always becomes corrupted. I've tried multiple file type (zip, PFX, txt)

    I find that whenever I drag and drop OR insert a file to be embedded into a word doc, Word always modifies/corrupts the file by prepending some information to the beginning. It never seems to preserve the file as it was.

    So my txt that I drop in, when I find it inside my renamed ZIP (former DOCX) file in the embeddings folder, has new binary information, such as a path, or similar info at the beginning. If I drop in a ZIP or other type; same issue, it becomes corrupt, based on the new data added to the beginning of the file.

    Example txt file
    ## First line of example file

    Example embeddings\oleObject1.bin corrupted:
    a few KB of messed up binary / path data .. e.g.
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿR o o t E n t r y " yadda yadda
    ##First line of example file

    The "real" data is still there, but buried / corrupted. Any thoughts? I've seen this behavior for years.

    ReplyDelete
  4. I noticed I couldn't amend my post to add "Hey I respect your work and appreciate the details, do you think you could spare a moment to help with a problem I think you have particular insight with? Your help would be much appreciated."

    ReplyDelete
    Replies
    1. Sorry, Joel. Although I must have approved your other comment, I managed to overlook it in a practical sense and failed to respond.

      I don't follow exactly what you are doing; that's the problem sometimes with mere words on a page or over the phone and the reason why I insist on TeamViewer for a lot of the questions that come my way now. The means by which you access the content inside the ZIP are very important. The only way I have been able to avoid corruption is to use Windows Explorer. All the file compression tools I have used (like WinZip) result in trouble, though I'm sure there must be some settings combination that won't. Since I published this series of tip articles on dealing with objects in Office 2007 and later, there have been some changes in translation technology which make it a bit less urgent perhaps. memoQ now includes the most advanced facilities available for dealing with embedded objects and graphics, though I know XML chart objects are not handled yet as they are with Rainbow/OmegaT. I haven't thought to test embeddings of older Office objects (BIN) yet, though I really ought to soon.

      Delete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)