About file formats for translation

Share on facebook
Share on twitter
Share on linkedin

Electronic files are divided into two types: original layouts and non-editable copies of original layouts.

The opportunities, advantages, and benefits for customers from using computer-aided translation systems are obvious. Cat tools support 65 file formats, and their number has been gradually increasing.

To divide the source files into types correctly, firstly, it’s necessary to understand the principle which CAT tools use to divide the text into segments and take into account that the translation text length increases by 5-20%.

In addition to the CAT segmentation rules, it is important to bear in mind changes in the translation text length.

The source file has a correct layout: the translation needs to be formatted and the file is ready to be handed over to the client.

The source file has a wrong layout: fonts, rows, paragraphs, inscriptions, pictures, tables will not be in their proper places in the translation. Formatting won’t help. There is no other way but to make a correct layout. This work is done by DTP specialists, whose services will lead to higher costs and extending deadlines. Visit this page to learn the cost of DTP & LAYOUT.

Now we can divide file formats into two conditional types correctly: original layouts and non-editable copies of the original layouts.

1. Original Layouts of Electronic Files.

By providing such files for translation, the customer will save money and time. Usually, they have the correct layout and can be immediately uploaded to the CAT tool and translated. The system will segmentize the text correctly, and then it will take several minutes to format the finished translation. The most common formats for such files are listed below:

  • Microsoft Office

    DOC/DOCX, XLS/XLSX, PPT/PPTX, PPS/PPSX, POT/POTX

  • Open Office

    ODT, ODP

  • Text

    TXT, RTF

  • Formats of AutoCad drawings

    DWG (special utilities export / import drawing texts into MS Office package)

  • Hypertext, source code

    HTML, XHTML, PHP

  • Bilingual interchange formats

    XLIFF / XLF / SDLXLIFF / MQXLIFF / SDLXLIFF, PO, TTX

  • Desktop publishing

    MIF, IDML

  • Technical writing

    DITA XML, HELP+MANUAL XML

  • Localization

    XML, Android XML, RESX, DTD, JSON, TJSON, YML, INC, INX, MIF, STRINGS, PROPERTIES

  • Subtitles

    SRT, TTML

  • Script formats

    STORY

  • Packages

    TTX, SDLPPX / SDLRPX, ZIP, WSXZ

2. Non-editable Copies of Original Layouts.

Often these are scanned documents and PDF files. Recognition is the first step to prepare these kinds of files for translation. This means to recreate the original layout with a correct formatting. This work is done with the help of FineReader by specially trained DTP specialists. Our guys know the intricacies and nuances of CAT and take them into account when recreating the layout.

Recognition (OCR) services are much cheaper than DTP & Layout services, but the client also has to pay for them when he or she expects a competent and well-executed translation. The most common formats for such files are listed below:

  • PDF
  • JPG/JPEG
  • TIF/TIFF
  • BMP
  • PNG
  • GIF
  • DJVU/DJV
  • DCX
  • PCX
  • JP2
  • JPC
  • JFIF
  • JB2
  • Any document formats when the text is saved as a picture, photo, or image.

Any non-editable copy always has the original layout in electronic format. You just have to find it. It’s often enough to ask for it your colleague, partner, or supplier. It will help to reduce the cost of Recognition (OCR) and DTP & Layout services. Sometimes, DTP & Layout services costs many or even dozens of times more expensive than the translation service itself.

Please, don’t send files for translation that have been automatically recognized by web services, FineReader, or other applications. It will only make the work more difficult and expensive.

Yevhen Venherenko

Yevhen Venherenko

Leave a comment