Field Notes: Metadata

Written by

Argos Multilingual

Published on

16 Dec 2025

Join Stephanie Harris-Yee and Erik Vogt for an in-depth exploration of metadata’s critical role in modern localization. Discover why metadata is emerging as one of the industry’s most significant untapped opportunities, and how better data structuring can unlock billions in efficiency gains.

Key topics covered:

The $7 billion opportunity hidden in transactional overhead
How metadata serves as the foundation for effective AI orchestration
Practical approaches to metadata classification and hygiene
Risk-based routing and workflow optimization strategies
Overcoming organizational barriers to metadata implementation
Using AI to identify and enrich missing or inconsistent metadata

More “Field Notes” Episodes

Explore more topics with Stephanie and Erik in our Field Notes series, where we break down complex localization concepts, ideas, and experiments for industry professionals. Check out our other discussions below:

Field Notes – Episode 7: Metadata

Stephanie Harris-Yee: Hello, I am Stephanie and I’m here with another episode of Field Notes with Erik Vogt. Now, this episode, we’re going to be talking about one of those more, I don’t know, obscure might be the word, topics of metadata. So, Erik, over to you.

So I know that you’ve said in the past that metadata is becoming one of the biggest untapped opportunities in localization. So what is really at stake here?

Erik Vogt: As we’ve talked about before, if you look at a $70 billion industry, and I’m just rounding out here and we assume that roughly a 10th of it is spent on just consuming transactional overhead. So project management, project coordination. The transaction layer as opposed to the value creation itself, which is often the human in the loop or the technology being deployed.

That means there’s something like $7 billion being spent on coordination work of some kind.

But really, metadata becomes one of the most important pieces of the puzzle when we’re trying to unpack how to make that $7 billion load more efficient? And if we could, if we could make that, let’s say 30% more efficient, we are talking about a $2 billion potential in our industry.

So certainly nothing to sneeze at. It’s absolutely worth paying attention to.

Stephanie Harris-Yee: So then with that said, where does AI actually fit into that whole picture? How does AI change the way we think about metadata?

Erik Vogt: Well, I’ve been party over my career of watching several ERP systems either be deployed and fail entirely and just got canceled entirely, or they were stripped down so much that the original idea of the data that was being managed, wouldn’t really deliver the value that they’re looking for.

So they stripped it down and only delivered a fraction of what the systems were designed to handle. It’s different systems that don’t talk to each other. We talked about connectors before.

We talked about how the complexity of these different pieces of information that are spread out all over the place. What is really going on? How can we really address this problem space? This is the complaint that, that is driving a lot of the inability of us to get more done faster. And so how does AI play into this?

Well, AI is good at some things. And it’s not very good at others. And I think one of the, one of the things that it’s very good at is just summarizing things. But it’s not deterministic. So it’s good at taking blobs of stuff and distilling it down or taking a lot of ambiguity and bringing clarity to it, but it’s not very good at directing things. So getting the exact pricing, calculated for a particular task is very difficult for AI to do reliably. It’s also hard to do routing. Almost all the discussion about AI in our industry is about how to make translation more efficient or more accurate.

I’m talking about the layers outside of that. So yes, we are talking about making MT more efficient. Yes, we’re talking about cleaning it up, automatic LQAs, all that’s very important. That’s the essential part of our business. But as the task cost goes down, the transactional friction as a percentage of overhead will go up.

That’s just if we do things the same way, handling a 20 hour task versus a two hour task with the same project management overhead, we’re gonna end up with a massive mismatching of cost allocation. So things like licensing costs, things like the project management time becomes a bigger lift.

Now where does AI fit into this? AI is often highlighted as this sort of magic thing that’s going on, but we have to break this down into where exactly it fits into this ecosystem. And I think what it really does well is to do summarization as I was mentioning earlier. So when we’re thinking about that as a problem space, what are we actually summarizing that we can make more efficient?

So one of them is what workflow should a request follow? Some systems are designed to look at an asset and summarize what it is and then pick the right domain that it belongs to. So being able to correctly assess and then tag that information about what this thing is can help us do the routing more efficiently.

I think it also can handle things like looking for anomalies. It can identify risk patterns. It’s good at that kind of thing, but it’s not very good at saying, which vendor should I use or, which MT model should I use necessarily. It can be a facilitator in the classification step of a process, but you still need to build a logic into the system to make that, make that work.

The orchestration needs data. It needs access and it needs data. AI can be a component of delivering part of that data that is, or helping to inform the orchestration model as to how to deliver certain output.

Stephanie Harris-Yee: And so I’d imagine with that cleaner metadata is better. So can you explain a little bit how cleaner metadata and AI lead to more meaningful orchestration and just not the automation hype that we hear

Erik Vogt: Yeah, so let’s break this down. So the AI orchestration is, is metadata in motion. So the metadata are containers that are, that a object thing, some asset or some segment or whatever. And the metadata can also be a set of rules. So the AI can be an interpreter of that and can help understand it.

And then the orchestration is the execution of that so in order to build this up, you use metadata to build a map. And that’s all the instructions that are under the hood. AI can help you navigate. And then orchestration is really the driving of the car through that system.

So we could also think about this in a metaphor of computer vision where the data is a bunch of numbers about how far away objects are from the vehicle, from the sensors, and the AI synthesizes all that information into a meaningful recommendation. And the orchestration is the actual choice to steer the wheel to the left or to the right to avoid the fire hydrant or the, the car stop before the car in front of you stops.

If you have bad metadata then AI has the wrong information upon which to make the decisions and it can misclassify the content, it can pick the wrong MT model. It can, it creates work, it creates risk, and it basically automates the wrong steps.

So it automates based on the wrong information. So you can think about this and on the opposite side, which is good metadata plus AI can now drive self-improving workflows. This is places where AI can help enrich data and in fact, you can use AI to help enrich the metadata with supervision, obviously.

It can help identify missing or inconsistent metadata. So that’s a information layer where we’re using AI to help find where our systems are lacking in the structured data that it needs for this. When I’m talking about structured data, I’d like to look, expand the bubble here a little bit.

Because it isn’t just the, the TM, it’s structured data, it’s source equals target, and then there might be some product information who worked on it now, blah, blah, blah. All that stuff we’re used to analyzing. But we also have other data that about, say the thing that we’re translating about, so we could make a knowledge graph that says, this product has these characteristics, so when translating, make sure that as you’re referring to this product, make sure that it refers to this set of structured information about this particular product.

There’s so much potential here that tools that are developing data lakes or data that is informing the translation, it’s more than just the segment A equals B. It’s all this other information that could potentially enrich that translation layer. But AI can also help with the actual meta layer, the layer about the workflow and how the different tools are talking to each other.

Stephanie Harris-Yee: Okay, this might be kind of a two part question, but first, what does this look like in practice, if you’re actually going to try to go about this? And then maybe what should people realistically expect when they’re doing this and they’re focusing on the metadata and that metadata hygiene with that AI assisted classification?

Erik Vogt: So step one is to identify your metadata and any taxonomist or any kind of structural information needs to identify what it is that needs to be structured. So think about a super simple layer is in the language ID. We run into this already our industry runs for this problem. Very simple metadata structure where we don’t even know exactly what we mean by the language codes.

Many times localization teams will start off with, “we want ES, IT, FR, and DE” and we sort of intuitively know, okay, those are general. Next question: French for Canada, French for France, French for New Guinea? Spanish for Latin America or just for Puerto Rico? We’ve had, requests for very specific variants.

All versions of Spanish aren’t the same. And so if everybody started off with structured metadata as say we, we understand that we need to be precise about what we’re classifying, we can start to have a much better conversation about what this actually means. But there’s also other types of metadata that we should be talking about.

Like where is our PM time being consumed? Being able to track internal overhead of coordinator time or project management time or localization engineering time. These are transactional tasks that require a lot of human labor to execute certain steps. In order to measure those, you have to decide that they matter, and then you measure them and then you can start optimizing them.

There’s other things, I just saw a presentation at Slator yesterday that was talking about risk. That’s another layer of metadata that I think we as an industry could do a lot with. So how risky is it that this is wrong? What are the consequences of failure? So in this presentation by SAP, they classified risk in terms of consequence and probability, and that’s how risk managers think about things.

So what if all of our projects had a risk value associated with it, could we use that to route things? Like low risk goes to MT only for example, or maybe medium risk, might go through an MT plus AI review and maybe, some superficial human monitoring of that system. And maybe a high risk, high touch would be not touched by MT at all, but be a hundred percent, hand done.

So a risk that’s metadata. We’d expect being able to make better decisions with this information, with better data about these systems.

Stephanie Harris-Yee: So why don’t more companies focus on this? It seems like the upside is very large. So what’s holding people back?

Erik Vogt: Yeah, I’ve seen several ERP implementation systems fail, and largely because the amount of data that we have in our industry is massive and complicated and difficult. And I think there’s an element of that, which is an ergonomic element. Like how do people interact with this data?

How do we collect it, how do we validate that it’s real? It’s also tedious, it’s invisible. There’s ownership fragmentation, like lots of organizations don’t really have a core owner of this as a, like maybe there’s a company with a chief information officer, but many of us don’t have the luxury of having a chief information officer that can say, Hey, team A, I need this, and this from you.

Team B, I need this, and this from you. All of this stuff fits in with an architecture that all fits together. Also, let’s be honest. PMs are overloaded. I think they’re generally scheduled at 110 to 140% of their capacity generally. So they’re usually working long hours, dealing with a lot of uncertainty.

They don’t have time to really think about this metadata structurally, and they’re also reacting to a lot of the other systems that impose metadata on them, such as many different TMSs, many different CMSs, many different LMSs, all these different systems that I’ve talked about before.

The complexity makes it really hard to sort of take a step back and say what is the system that we can organize this information with? And I think there’s kind of a fear of breaking things. There’s institutional amnesia and institutional inertia.Both of those tend to slow this down. I remember once I inherited two different project teams who were previously competitors with each other.

And they, it’s, it’s funny, the two teams were working for the exact same client. I mean, literally the exact same subdivision of the exact same client. They had totally different quoting mechanisms, totally different units of measurement, totally different ways of structuring things. So in order to reconcile this took a couple years because we had to slowly kind of unpack if you could take one system and impose it on the other one.

But very often both teams have some influence and they’re resisting giving up their preferred methodology. So they end up battling it down to a, it’s a war of attrition internally as teams are like struggling to hold onto the system that they’re comfortable with at the expense of the bigger picture.

So yeah, there’s a lot of things that make metadata much harder to deal with and it’s hard for businesses to grapple with this. It’s non-trivial. However, AI is giving us this interesting new way of looking at things. I used AI to kind of create a matrix of risk versus variability. So I was looking at all the different metadata for all the different sort of ways that we can think about metadata.

And then I asked to classify where the variability are, where is the biggest chance that metadata will mismatch. So language codes, for example, it’s catastrophic. If you get the wrong language code map, you get nowhere. ID, project ID is also very critical.

There’s others, you know, that that can be a little sloppier and maybe a little less risky. Anyway, my recommendation here is anybody who’s interested is to take a step back, think about what matters. Look at the metadata, maybe do an analysis of how important each of these metadata categories is. Use AI to help find or plan for how to collect it or clean it up, and then think about how this could be used in a structural way to either enhance your operational throughput or be able to improve the product of your output.

The actual translation itself, which I think most of our industry is talking a lot about. I’m here to sort of raise the flag on a whole bunch of hidden stuff that we tend to be fretting about in the background, but not necessarily surfacing as a real business problem.

Stephanie Harris-Yee: Thank you again Erik, and I think this has been a good episode. So we’ll see you next time.

Erik Vogt: Thanks Steph

Share this post

Field Notes: Metadata

More “Field Notes” Episodes

Field Notes – Episode 7: Metadata

Check our other resources

Subsribe to our Newsletter