Some years ago, I published a description of how data from these multilingual EUR-LEX displays can be transferred to translation memories or other corpora for reference purposes, and more recently I produced a video showing this same procedure. But some people don't like the paragraph-level alignment format of the EUR-LEX displays, and these can also occasionally be seriously out of sync for some reason, as in this example (or worse):
Now I don't find that much of a nuisance when I use memoQ LiveDocs, because I can simply view the full bilingual document context and see where the corresponding information really is (kind of like leaving alignments in memoQ uncorrected until you actually find a use for the data and determine that the effort is worthwhile), but if you plan to feed that aligned data to a translation memory, it's a bit of a disaster. And many people prefer data aligned at the sentence level anyway.
Well, there is a simple way to get the EU legislation texts you want, aligned at the sentence level, with the individual bitexts ready to import into a translation memory, LiveDocs corpus or other reference tool. See that document number above with the large red arrow pointing to it? That's where you start....
Did you know that much of the information available in EUR-LEX is also available in the publicly available DGT translation memories? These are sentence-level alignments. But most people go about using this data in a rather klutzy and unhelpful way. The "big data" craze some years ago had a lot of people trying to load this information into translation memories and other places, usually with miserable results. These include:
- the inability to load such enormous data quantities in a CAT tool's TM without having far more computer RAM than most translators ever think they'll need;
- very slow imports, some apparently proceeding on a geological time scale;
- data overload - so many concordance hits that users simply can't find the focused information they need; and
- system performance degradation, with extremely sluggish responses in a wide variety of tasks.