Language Technology Tools – The Okapi Framework
August 06, 2018Yves Savourel
The Okapi Framework - An Emerging Language Technology Tool
Argos Multilingual - committed to improving localization quality and costs - is actively involved in the development of the Okapi Framework, a set of interface specifications, format definitions, components and applications that provide an environment to build inter-operable tools for the different steps of the translation and localization process.
The framework is developed as an open-source project driven collaboratively by a community of users. You can find the project’s main page at The Okapi Framework and its source code is hosted on Bitbucket.
The tool is a set of Java libraries and components you can use with your own programs and scripts. It includes filters for various file formats, an SRX-based segmentation engine, an ITS processor, and many other utilities that can help you build customized, flexible and powerful localization processes. The framework is cross-platform and runs on Windows, Linux, and Mac.
In addition to the Lego-like building blocks the project offers, it also provides a few applications ready to use out-of-the-box:
Rainbow, a Java application, allows localization professionals to prepare files in various formats (HTML, XHTML, SVG, etc.) for various given translation environments, converts files from one encoding to another, and also performs a number of additional localization-related tasks. From Rainbow’s main window, the user simply selects an input file, chooses the utility to run, sets the options, and then executes.
For example, Rainbow features a Regex Filter that extracts and mergers text from any file format where the extractable text can be identified using regular expressions. The tool also provides support for .NET resource files, PO files, Java properties, Markdown files, MIF files, Microsoft Office documents, XML, HTML and much more.
Once the translation is done, Rainbow is used to merge the text back into the original file format. Respecting the need for localization industry standards, Rainbow allows localization professionals to generate TMX (Translation Memory eXchange Standard) documents, as well as other translation memory formats.
CheckMate is a Java application that performs various quality checks on bilingual translation files such as XLIFF, TMX, TTX, PO, TS, Trados-Tagged RTF, and any other bilingual format supported by the framework.
Olifant is a .NET application that allows users to open translation memory files and perform various maintenance tasks on them. Olifant features regular expression search and replace, criterion-based filtering, and many other functions that are useful to any user maintaining translation memories.
Ratel is a Java application used to create and maintain segmentation rules. Such rules are used to break down translatable text into more re-useable parts. Ratel uses Okapi’s SRX-based segmentation engine. SRX is the Segmentation Rules eXchange format. The application includes a test feature that allows you to see immediately the effects of your segmentation rules on your own sample text, as you edit the rules.
Tikal is a command-line tool that offers many functions, including simple extraction and merging, various file format conversions (TMX, CSV, tab-delimited, PO, etc.), access to translation resource, import and export for the Pensieve TM, and more.
You can see more details and screenshots on how to use the Okapi Tools in this article of the ATA Chronicle.
Want to know more: