Field Notes: AI & Localization Innovation in Progress

Written by

Stephanie Harris-Yee Global Marketing Director

Published on

20 Jun 2025

Are Fuzzy Matches Dead? The Evolution of Translation Technology

Join Stephanie Harris-Yee and Erik Vogt for an insightful discussion about the future of translation technology and whether fuzzy matches – a long-standing pillar of translation workflows – are becoming obsolete. Learn how AI and machine translation are reshaping the industry’s approach to translation memory and quality assessment.

Key topics covered:

Argos MosAIQ‘s AI review capabilities
The shift from word-based to object-based translation models
LQA and MTQE quality assessment tools
The evolving role of Translation Memory (TM) in training AI and MT systems

Stephanie Harris-Yee: Hi, I am here with Erik Vogt, and we are going to be diving into the question that seems to be popping up more and more recently, and that is, are fuzzy matches dead. So this is of course, talking about the concept of when you’re doing a translation and you have a translation memory. The system will remember that you have translated something before. And if a group of words is similar in the new content, but not exact, the system will flag that as a fuzzy match, which are supposedly much quicker, easier for a translator to validate, and thus should come with a discount. Now, Erik, why could this whole concept, this whole system, be going away?

Erik Vogt: Steph, that’s, it’s really interesting that it even exists in the first place, honestly, because I don’t think it’s ever been actually validated that fuzzy matches are by definition. Easier to work on than not. One can even make the argument that a hundred percent matches aren’t necessarily easier to validate than not. Just because they match something that exists doesn’t necessarily mean that what you’re leveraging it from is appropriate for what you’re leveraging to. But in the industry, we’ve set up this framework, this heuristic for time time savings and discount savings based on similarity. And I think that AI, MT both have challenged that. So I think even when MT was first coming out, there was a question: is MT better or is the fuzzy match better. When you’re comparing apples to apples, which is going to get to the closer outcome, the fuzzy match, you already see that it’s not perfect. You have to work on it. MT you may not have to work on it. It may be more accurate than the fuzzy match. So then you think maybe it depends on how big the gap is. So we talk about everything below 75% match, for example, let’s do MT on that. Or maybe it’s everything, 85% with MT getting better and better. Then we would presume the MT would chew up more and more of this fuzzy match categories to the point where there’s no longer any question about it that the MT is gonna be better than fuzzy matches in all cases. And now this is MT that we’re talking about. Now we’re moving into an AI review paradigm.

Like the one we have with MosAIQ , we’re not the only ones, but we have different capabilities here that an AI is already taking the fuzzy matches and adapting them to what it thinks the correct answer should be. And it’s the, it’s interestingly the UI then, the translator task now is not to work at a different rate depending on whether it’s a low or high, or medium fuzzy, which is remember that’s the implicit assumption, right? That it’s more, it’s easier for a translator to fix something that has a high fuzzy score than it is for a low fuzzy score, which, again, is, for the most part, an untested hypothesis. But in general, we accept it as a standard in our industry as a heuristic. Now, the reviewers are specializing more in getting the final content correct. And they don’t see any difference between what the fuzzy match originally was. In fact, you might be looking at totally different signals such as an LQA score or a QE score, or other sort of tools that you’re using to flag things that might be wrong for different reasons. It’s more, it’s easier for a translator to fix something that has a high fuzzy score than it is for a low fuzzy score, which again, is an, for the most part an untested hypothesis.

But in general, we accept as a standard in our industry as a heuristic. Now, the reviewers are specializing more in getting the final content correct. And they don’t see any difference between what the fuzzy match originally was. In fact, you might be looking at totally different signals, such as an LQA score or a QE score, or other sort of tools that you’re using to flag things that might be wrong for different reasons.

Let’s be honest, like we want the translator to be successful. Like, we want to have them be effective at what they’re doing. Another, just as a minor segue on this, which is a related point, there, the. A lot of the technology providers in the industry make technology to try to facilitate the work of the translator, and on the flip side, to drive a hypothetical cost reduction, right? So you can create this technology. You say it’s gonna save 20%, and then you put it into the wild, and it may or may not deliver 20%. You can pay 20% less or get paid 20% less. But it doesn’t necessarily mean that the work has reduced by 20%. Mainly because we don’t really check effort, particularly in our industry, we check the outcome. So I think that as we switch to a more, LILT kind of tackled this. And I think RWS also had this adaptive MT model that was designed around this principle that you are focused on a KPI, which is words per hour. Like, how fast are you working? And you provide that feedback to the translator, and they can see how they’re doing.

And you’re basically incenting speed. We know that speed is inversely correlated with accuracy. So these are, countervailing kinds of trends. But suffice to say, ultimately, that really matters is time. What really matters to the translator is helping them to get things right. So I’m not sure anymore that fuzzy matches are doing that much to help translators do their job, especially when you have so many other powerful tools out there like machine translation, which, by the way, more often than not now is enhanced with AI technology on the backend.

So MT is getting better all the time, and we have extra, AI Quality layers that are being added on top of it that are now enhancing even more, and of course we’ve talked about LQA and MTQE and other technologies that are designed to analyze, input and establish whether or not it’s… so how many different of these tools do we need? I would say maybe fuzzy matches really are on their way out, and we don’t really need these anymore. And we should mainly, and this is something I believe in my heart, is that we should keep reminding ourselves that at the center of this whole story of localization, is the enhancement that the human is providing, the validation and the authenticity of their knowledge and their validating that this is true, whatever this segment is true and correct for this output. So anyway, these are just a bunch of thoughts on this. Lastly, I’ll just close by saying one other thing. There’s an argument that could be made that a hundred percent matches and repetitions also should be threatened in this new world as a model, largely because if you think about a segment-based model. That is breaking down a project into words that presume that this source equals this target. You at best, with exact matches, are looking above, like above and below the line to see what are its neighbors. Also the same that I’m gonna be more confident that this is exactly right.

Because I believe that what we did in the past is exactly right, but we also have this capability to do a summary of an entire object. To have AI retranslate that entire object as an entirely new object. It’s not, it’s no longer a word-based model. It’s a object-based model. And now you can tune up or down that output based on totally different parameters. So you could take the English article that is about 10 ways you’re screwing up your coffee and turn it into a Chinese article.

That’s 10 ways to delight your family with excellent coffee or tea, right? You can change those things tone-wise, you can change it in terms of vocabulary. You could tune it down to a fifth grader level or up to a PhD level. There’s so many things you can do now at a holistic level, and I’ll, do a call out to those in the Lang- ops space who are, I think, very interested in exploring this object based model, getting away from word count model in any case.

My hypothesis is that this time it might be for real, maybe fuzzy matches really are on their way out. And we’re gonna see a new evolution towards a model that completely excludes ’em and focuses on the tools and controls that we have that are deeply connected to making the human in the loop the most valuable as we possibly can make them. Okay. Here’s maybe a tangential question. Do you see this then affecting the value of translation memory in general, or is that still gonna be super important for folks to make sure that it’s clean, it’s ready to go. Just for like custom training, the MT side of things. Okay. There, there’s two different use cases for TMs there, and one is as a leveraging tool, and one of them is a training tool. And I think you’re right, there are already a significant number of deployments in which TMs have only been built for the purposes of training. An an MT. And this is particularly in kinda high volume eCommerce contexts where you’re just trying to get to good enough to make a buying decision type of level of quality. But I think that the… this does illustrate an excellent point that what you’re, what you need to train your AI. If you think about completely unpacking the system and starting over again, might be totally different than what we’re used to. So we presume that language layer is really just source and target segments, but maybe there’s things like knowledge graphs and product tables that define the characteristics of the objects that you’re trying to describe and that you, and then of course, there’s the glossaries, which are mission critical. There’s things like style guides, which are mission critical, but the TM itself is a byproduct of the application of glossary and style guides. And if you can properly train MT, or an AI, with the appropriate guidance to replicate that, then yeah, a TM may not be something you need to use day to day as a leveraging tool. I suspect it will still be useful as a training tool. So if you’ve bothered to pay a human to review and you still care about that segment to segment parody, then that can still be a training asset for an MT or AI, which to be honest, they’re not a hundred percent different concepts like neural, MT, and AI are basically very similar core technologies, but there’s very different applications of these things.

And as we attack our localization process of the future, I a hundred percent think that we need to question the value of an MT as well, especially old big ones that require a lot of cleanup. We do provide that service. We, it is possible, but we see a lot of times the reason for that cleanup is to train an MT not to keep it in a leveraging function forever.

Stephanie Harris-Yee: Okay. Thank you Erik once again and yeah, interesting thoughts for sure.

Erik Vogt: Yeah. Thanks for the time, Steph. I, I would absolutely drop what I’m doing to talk about this with anybody who’s interested.

Stephanie Harris-Yee: Reach out to Erik on LinkedIn. I’m sure he’ll be there.

Erik Vogt: I will, thanks so much, Steph.

Share this post

Field Notes: AI & Localization Innovation in Progress

Subsribe to our Newsletter