Since our founding in 1996, we have championed the value of human translation – and we still do. However, the past decade has seen a steady advance in the quality and application of machine learning (ML) translation solutions.
Newly developed machine translation (MT) tools driven by artificial intelligence are already translating tens of millions of messages per day, and proprietary ML translation solutions from Google, Microsoft, and Amazon are in daily use. MT occupies a vital space in our translation services value chain, and it’s a space that we’re confident will continue to grow as the technology matures.
At its most basic, machine learning is the process of teaching a computer system how to make accurate predictions when fed data. We use training data to observe patterns, “teach” the machines, and create multiple translation models. When it’s time to translate, we input data to an algorithm that looks up the correct parameters in the model and creates a machine translated prediction, which is then evaluated by a human translator for accuracy. If it meets our standards, it’s then sent back into the machine translation engine and used to improve the algorithm.
Neural Machine Translation
Neural machine translation (NMT) lets machine translation engines train themselves using a trial and error process that’s similar to the way a human brain works. This process is called “deep learning” and comes from principles that have been established through the implementation of Big Data analytics. The potential of NMT is still being measured, but what’s already clear is that it nearly always improves translation quality and produces a more “human-like” output.
At Argos, our approach to NMT is all about making sure that it’s right for a client’s content. If it turns out to be the correct approach, NMT can drive the following positive outcomes:
- Increased productivity and reduced time-to-market. On average, an NMT-based approach can increase daily translation throughput three to four times. Simply put, NMT gets content in front of your customers faster than ever before.
- Lower costs. NMT allows for the charging of lower rates, and in the long term all future projects from the same subject area will leverage off the savings achieved by applying it.
- Better consistency. Large projects require the use of multiple human translators, so achieving the same tone and control of terminology is typically more challenging than with a customized NMT solution.
Machine translation is a form of computational linguistics and language engineering that uses software to translate text or speech from one language to another. The two most common engines we used are rule-based and statistical. These engines differ primarily in the way they process and analyze content:
- A rule-based machine translation (RBMT) engine uses linguistic rules to break down the content. It produces more predictable output for terminology and grammar through the use of customized terminology lists that fine-tune the engine and make it possible to correct every error with a targeted rule.
- Statistical machine translation (SMT) uses statistical models to generate the translation of the source content. This engine does not analyze texts based on language rules, but is built by analyzing a bilingual corpus and requires an appropriate volume of bilingual content in order to do so.
Because each engine processes and generates data differently, the engine chosen for a project depends on the target languages and the availability of reference materials for the given source files.
At Argos, we share best practices in content creation with our clients to improve the quality of their source content and in turn increase the probability of developing a more effective Machine Translation engine. We’ve found that machine translation works best with content that is repetitive and simple, which is why we recommend applying controlled English guidelines when authoring content. In addition, authoring in an effective content management system allows you to save high quality translated content and reuse it for future projects, reducing translation costs and increasing consistency across projects.
Post-editing involves humans amending machine-generated translation to achieve an acceptable final product. We use different types of post-editing depending on what our clients require, but the one thing they all have in common is that linguists are an integral part of the process. For texts that are not meant to be published externally, light post-editing can be a solution. For this type of post-editing, only major mistakes that impact the ability to understand the text (mistranslations, omissions, additions, etc.) are corrected. For publishable content, full post-editing is the solution. The final text has to read as though it was done from start to finish by a human translator. This involves our post-editors following the client’s terminology, adapting to the source text style and format, keeping consistency, and correcting every single mistake that is detected.
Sometimes used interchangeably, natural language processing (NLP) and natural language understanding (NLU) are actually two different concepts that have some overlap. NLP sits at the cross-section of computer science, artificial intelligence, and data mining. It focuses on how we can program computers to process large amounts of natural language data in a way that is productive and efficient, taking certain tasks off the hands of humans and allowing for a machine to handle certain processes – the ultimate “artificial intelligence.”
NLP can refer to a range of tools, such as speech recognition, natural language recognition, and natural language generation. Common NLP algorithms are often manifest in real-world examples like online chatbots, text summarizers, auto-generated keyword tabs, and even tools that attempt to identify the sentiment of a text.
When you work with Argos, our team typically takes a short ramp-up period to get familiar with your testing requirements. We will attempt to clarify the full scope and metrics and then evaluate the current testing process with insights on how best to tune your models, running tests on one language to validate the process and formalizing the process in a testing script for the linguistic testers. We also define all reporting requirements, and our “train-the-trainer” approach enables to us deliver the highest quality possible.
The ability to evaluate translation quality is crucial. By collecting, categorizing, and analyzing data and combining it with human intelligence, we’re able to build accurate training data sets that can be applied to AI engine training as well as to previously translated content at any point in the localization process. Running a massive amount of content through MT and then painstakingly annotating it according to defined categories makes it possible to train MT engines that produce extremely high-quality output. This can also be done on a smaller scale to prepare for the translation of specific content.
Annotation services make the validation and improvement of linguistic assets a more efficient and valuable process, particularly when applied to either third-party review or the review of existing translations by subject matter experts. The annotation services we offer include data collection (via a QA tool and expert resources), categorization and expanded analysis (including sub-classifications for errors), and comprehensive reporting that makes trends easy to spot.
Our team first conducts a current state analysis of your content in order to understand what your current translation process and environment look like and get to grips with the quality metrics and management that you currently have in place. We then create quality criteria and pass/fail thresholds that meet your specific requirements, which we can use to alternately track your content or monitor your third-party translation suppliers. Errors are categorized and classified as either major or minor, with a numeric score attached to each error and severity level. By offering more granular error categories, we can assist in the identification of specific problem areas and help create solutions for eliminating specific errors.
After a careful look at what our clients needed most, we developed the capability to automatically generate readable and interactive QA reports that summarize the warnings and errors found and make it easy to see at a glance the areas where your translations can be improved. Every detail of our annotation services has the same goal in mind – making sure that your translated content communicates your company’s brand, products, and services as accurately as possible to your global audience.