Shaping the Future of Translations with the Okapi Framework
April 23, 2018Yves Savourel
Where did the Okapi Framework idea come from and how did it all start?
For many years ENLASO provided a set of open source tools in C# for Windows. The reasoning of using an open-source model was that it allowed more people to contribute to the development. Initially the contributions were mostly on the testing side. The demand for a cross-platform solution was growing and in early 2008, Jim Hargrave who was working at the LDS Church in Utah, Asgeir Frimannsson who was working for RedHat in Brisbane, and I decided to restart the project in Java. The first code commit of the project was done on April 16, 2008. Quickly we had additional developers at work and the code grew relatively fast. You cannot give enough credit to the companies we were working for: they made the project possible.
If you want to go back to the early roots of Okapi you have to go back at the end of the 90s when ILE was developing some tools as shareware, called OpenKit. They were the seed of the open-source tools in ENLASO. Actually Okapi initially stood for OpenKit API.
What was the initial goal for the team of Okapi Framework developers?
There are several main ideas in the framework:
Abstraction: There are a few types of components that do the same thing but for different inputs or different context. For example, when you extract the translatable text from a file, you should be able to have a single interface to plug in your tools. So the framework defines one interface for “filters” and there are dozens of implementations of that interface, each works on a specific file format. But the tools have to only know about the interface to be able to use any of the filters implementing that interface. This makes it very easy to develop new filters and the new supported format then can be processed by any component of the framework.
The principle applies also to the data generated by the filter: the object model is the same regardless of where the data comes from.
Componentization: Another important idea is to work with components rather than large applications, so you can build the tools you need with those components, like a Lego set. It makes the framework very flexible and useable for many purposes. Some will just use the filters, others just the connectors to the MT engines, etc. You take only what you need.
None of this is revolutionary, but put together and applied to localization needs, it makes for a powerful toolbox. It also makes the framework interesting for many people with various needs and requirements.
What has been your role over the years?
I’ve been part of the core team, doing some of the development and a little bit of everything. Nowadays a significant part of the work of the core team is spent on monitoring the contributions, making sure they are integrated properly into the overall framework. We also spend quite a bit of time resolving the various issues the users found.
The Okapi Framework has made significant advancements over the years and has helped with the integration of tools between different systems that support the localization process. What has been the greatest challenge so far?
The greatest challenge is probably not technical: it’s to find the time to work on the project. Everyone in the team has a day job and while sometimes we can make working on Okapi part of some tasks needed for our day job, it’s not always the case.
On the technical side, the main challenge is probably to adapt some of the core parts of the framework, such as the object model, to be more efficient and better suited to the new requirements we see in the way data is localized.
What are the plans for the next 12 months with the Okapi Framework?
At this point a lot of the work in Okapi is driven by contributions, so it’s not easy to have a roadmap. But there are a few things likely to happen:
There should be an update of the connector for the latest version of the Kantan MT API, as well as new MT connector implementations for the DeepL and Amazon Translate services.
We also want to make the framework components more easily accessible. One way to do this is to publish them as maven artifacts to the official Maven repository.
We are also working on various fixes in several components. There is always a list of on-going issues as users provide feedback for improvements and bug reports.
And, obviously, there are the new features we would love to have and don’t have time to develop. For example, finish the work on XLIFF 2.1 in the XLIFF Toolkit library, implement some support for the nascent JLIFF format, implement the API currently worked on by TAPICC, and many more things.