A confusing message is circulating about the economics of AI. One camp insists that hiring back a junior engineer is now cheaper than using AI to build things; another insists it has never been cheaper to dig in and build apps yourself, no external teams required. Both claims can be true at once. Tokens are getting cheaper, that part is not in dispute. Moore’s law and economies of scale are clearly pushing unit prices down. But the unit price of a token is not the same as the cost of an outcome, and for localization that distinction is where most of the confusion lives.
Localization has long thought in terms of the word as a unit, a single bundled price that quietly absorbs many separate costs. GenAI breaks that habit. Different pathways now lead to the same outcome at very different costs, and the cheap per-token price can be deceptive. The hardware-level cost of tokens may keep falling, but enterprise AI bills are climbing anyway, because cheaper tokens get consumed faster and in greater volume. It is Jevons paradox in practice: the easier and cheaper a capability becomes, the more of it you use, until the savings on each unit are swamped by the sheer number of units.
What emerges is a clearer way to think about pricing. The generation layer, basic translation, summarization, rewriting, classification, first-draft multilingual content, is becoming a commodity and will keep getting cheaper. The system around it, terminology, translation memory, multiple passes, quality loops, multimodal handling, governance and the human accountability layer, is getting more expensive. The strategic move is to stop betting on whether tokens go up or down, and instead align each job to its risk profile: the cheapest model for the lowest-consequence work, premium models and human experts reserved for where the consequences of error are real. Price should follow the consequences of error, not stack margin on top of cost.
Key Insights
- Cheaper tokens and rising AI bills are both real. Unit prices for tokens keep falling, but because cheaper tokens get consumed faster and in higher volume, total spend can climb anyway. The unit price is not the story; the cost of reaching a given outcome is.
- The generation layer is becoming a commodity. Basic translation, summarization, rewriting, extraction and classification are increasingly cheap and no longer a strong basis for premium pricing on their own. The widening gap between automated and human translation puts pressure on any LSP whose pitch is simply “send us content, we’ll send it back translated.”
- The system around the model is what gets expensive. Style guides, terminology, translation memory, multiple agent passes, RAG and retrieval, quality loops, multimodal audio and video, and enterprise governance all add steps, and each step consumes tokens. Costs scale nonlinearly, so a system can explode in price even as the unit price drops.
- Price the consequences of error, not the word. Match the workflow to the risk: raw AI for low-risk internal use, AI plus automated QA for high-volume low-risk content, human review for customer-facing work, and risk-based premium review for regulated, legal, medical or brand-critical content. Human hours get more expensive as fewer people are qualified to do the work that matters, so spending them where they reduce real risk is what creates value.
More “Field Notes” Episodes
Explore more topics in our Field Notes series, where we break down complex localization concepts, ideas, and experiments for industry professionals. Check out our other discussions here.
Field Notes – Episode 14: Token Pricing and the Effects on Localization
Below is an automated transcript of this episode
Stephanie Harris-Yee: [00:00:00] Hi, I’m Stephanie, and this time I’m back with another episode of Field Notes with Erik Vogt. Now, this time around, we’re going to be talking about this concept of GenAI and especially like token pricing.
So what I have been seeing online is that there’s this kind of dichotomy of, on one hand, people are saying that, you know, all of a sudden hiring back that junior programmer is cheaper than using AI to develop stuff. And then on the other hand, I’ve also been hearing about it being a lot cheaper now for folks to just kind of get in, dig in, develop apps, et cetera, themselves without having to pull in programmers, external departments, et cetera. So we have a bit of a dichotomy here. So Erik, on the localization side of things, how do you think this is playing out?
Erik Vogt: First thing, don’t believe what you hear on LinkedIn, because there’s a lot of people who are pointing in [00:01:00] opposite directions and having a lot of fun with it. And I think what they’re picking up on in this chaos is actually that both are true, right?
That tokens are getting cheaper, and that’s demonstrated. If you look at the actual data, there’s no question about it. Moore’s law and economies of scale are clearly driving unit prices down for tokens. But when you aggregate the different steps required and the volumes required, the volumes of tokens for different outcomes that you’re trying to get to, they’re diverging quite a lot.
So if you’re trying to do something where, if you’re just doing a simple task like building a V1 of a website or V1 apps, it does a pretty good job relatively cheaply getting to that outcome. And for generic use cases, it’s relatively cheap still, and it will continue being cheap.
But the industrialization of it, if you add the different layers, then that’s where things get a little bit more complex.
Stephanie Harris-Yee: So looking at this, do you think that tokens [00:02:00] will continue to go down?
Erik Vogt: Yes. I think for basic text generation there’s gonna be continued downward pressure. There’s no question about that. There’s competition, there’s optimization going on with smaller models. There’s open source out there. They’re, you know, entering this. So whenever you have a competitive environment like this, all of the players are doing what they can to optimize the efficiency, both through more efficient data centers, through more efficient workflows, more efficient token utilization.
Everything is going to continue being cheaper. But premium capabilities are going to be monetized, and that’s when you think about the, you know, this output thing is changing what it costs to get where you wanna go as opposed to the unit price itself. And that’s something that localization has a hard time wrapping their heads around, because we think in terms of word as unit, and we think of like its cost, this cost, it’s all these bundled things, but it all loads into this one [00:03:00] price.
Now, with GenAI and AI capabilities, there’s different pathways that lead to different potential particular outcomes. So the simple token prices are just not the whole story, and it can be very deceptive. Because, like for example, one resource, one analysis, suggested that the per-token cost for large language models is expected to keep falling by sixty percent, six to seven percent annually at the hardware level. So that totally affects enterprise AI bills. But because they’re consuming more, this is like Jevons paradox, you’re able to do a lot more faster.
You can burn through those cheaper and cheaper tokens faster and faster. And I was doing this one exercise where I was putting together a presentation, and I blew through my budget immediately, and it’s just like poof. You have to buy more data, buy more tokens. So yeah, what you’re doing with them is mattering an awful lot.
Stephanie Harris-Yee: So from the localization standpoint then, when we’re [00:04:00] looking at all of the different areas, what do you say as getting cheaper? So what are the ones that the token prices are affecting, so it’ll go down in cost, versus the things that really might be even going up due to just the volumes?
Erik Vogt: Yeah. What’s gonna get cheaper specifically is the generation layer. Basic translation, summarization, rewriting, content extraction, classification, first-draft multilingual content. These are increasingly commodity capabilities, and they’re gonna be driving that price down. That means that they are no longer a strong basis for premium pricing by themselves.
So from a pricing perspective, any LSP who’s delivering that, the LLM, there’s multiple things. It becomes a cheaper component of the delivery itself, right? So it’s cheaper delivery. The expectations therefore from clients are that the price goes down, and the alternative is getting cheaper too.
So when you’re looking at the differential [00:05:00] between the automated translation and the human translation, it’s getting a bigger difference apart. So that, I think, is really for low-risk internal comprehension use cases, which already have been at MT level for decades, for a couple decades now, especially since 2018.
There’s rough translation triage support, draft adaptation. All these things are gonna be getting cheaper and cheaper and cheaper. For LSPs, this puts additional pressure on marketing and kind of the marketing messaging. Like, if your value proposition is just “send us content and we’ll send it back in a different language,” you know, the market keeps on pushing pressure down on that.
Now, the cost of the AI itself is also getting cheaper there too, so you’re gonna expect a smaller and smaller margin, which, as we’ve talked about before, that puts additional pressure on external systems like project management and engineering. Because that kind of core unit price is dropping, those other things become more of a premium value of doing those other things effectively.
So it’s interesting, like the translation part of what an LSP does is not the hardest [00:06:00] part of what they do. And often people think that’s what they are, but they’re actually integrating these different sort of platforms and systems and tools and, really, the accountability layer that we’ve talked about before.
Stephanie Harris-Yee: So is that, say like that accountability layer, or are there other things as well that you see getting more expensive?
Erik Vogt: Yes. And that’s where the model itself or the tokens are getting cheaper, but the system around it is getting more expensive. So when you think from a translation paradigm, you’ve got style guides, terminology, translation memory, product documentation, screenshots. You’ve got multiple passes, and that’s a big one, because you can have sort of an empty pass and you have one agent does one thing and one agent does another thing, and each of those agents is consuming tokens.
And because of the nature of these large language models, you have to resubmit the query multiple times during a larger project. So that just adds up. It’s a nonlinear thing. It’s not like you just load the subject. From an AI’s perspective, you don’t just load the style guide once and you’re [00:07:00] done.
You actually may need to resubmit some parts of that data to the large language model, because in between sort of segment one, segment two, it’s done about a billion other jobs in the meantime. So longer context when you have RAG or search retrieval, knowledge graphs, all these things put together.
When you have quality loops, that adds multiple steps. Each of those kind of adds tokens. They’re multimodal, whenever it’s audio, video or dubbing, that gets very expensive very fast, because those are just a lot of data going through those channels. And then there’s enterprise control and governance, which we’ve also talked about before.
But when you add kind of security layers, data handling, you know, there’s auditability layers, there’s kind of multiple tracks where you’ve got to store things. I think all that is adding a lot of complexity that’s a little bit outside of what most LSPs usually think of. When you’re talking to a technocrat, like they’re thinking in terms of optimizing this ecosystem, but this can very easily explode [00:08:00] into a very expensive system even as the unit price is going down.
Stephanie Harris-Yee: So then let’s step, maybe take a step away from the client-side stuff. As an LSP looking at this, what should they be doing?
Erik Vogt: Well, I think defending the old world order is getting increasingly difficult, and I think we were even doing a webinar back in January where I was hypothesizing that the word unit rate is under attack, and that any of these models that continue to fixate on a unit rate per word, without thinking about how many steps are involved to deliver a particular output quality, are gonna start becoming more and more irrelevant.
So it’s not that the word isn’t a meaningful part of that equation, ’cause you can use it to extrapolate things. But I think when you think about this, it’s really helpful to think about this in different tiers. So from a raw AI perspective, just without a lot of customization, just kind of pure in and out, that’s great for low-risk [00:09:00] internal use.
It’s usage-based. There’s no publishability guarantee. There’s nothing solid there that you need to worry about consequences. That’s the cheapest, and I think all of us are using that internally. Automatically it’s almost free, and it’s gonna keep getting closer and closer to free.
Then when you go up to AI plus an automated QA, there’s high volume but still low-risk content. Then there’s gonna be usage or kind of a workflow fee that starts kinda aggregating. Then if you add human review, that’s gonna get more and more expensive, and I think that’s where an economist would suggest that the human component of this is actually gonna get more expensive, ’cause fewer resources are actually gonna be qualified to do the part that we need.
So that’s gonna get more expensive. Customer-facing standard content is kind of tier three, I would say. And then if there’s domain assurance involved, and this is where you’re dealing with more regulated, you’re dealing with legal, medical, product-critical, [00:10:00] brand-sensitive content, the risk-based premium review, I think that price is gonna go up.
And I think that’s where LSPs, when you’re thinking about LSPs’ best use cases, to attack that tier three and four with efficiency at the right level. So those kind of internal mechanics that you need to solve resource allocation, you know, efficiency, but the actual human costs per unit per hour are actually going up.
So utilizing those more expensive hours becomes mission-critical. So how much value are you getting per unit time becomes essential for LSPs. And then at the top you also have kind of managed AI localization programs, which is kind of enterprise scale. That’s gonna be the most expensive from a compute standpoint, even if there’s no human in the loop.
So as humans, if you take humans into that tier three, they’re absorbing risk because there are human eyeballs looking at it, and they’re fixing everything that happens. But as you get to [00:11:00] more and more automated quality refinement and quality control, that drives the actual AI costs up. So that’s where you get this paradox happening, right?
Because you say, “Wait a minute, the AI is getting super cheap. Why are my AI bills going through the roof?” And it’s because while you’re now handling a lot more content, if you arbitrarily stick it all in there without thinking about it, you can very easily consume an enormous number of tokens. And that is a tough mental shift for a lot of us in the industry who are usually thinking in terms of human time, to be thinking in terms of AI tokenization costs as a primary driver. And that’s where you get that situation that we started with, where the engineer, it’s cheaper just to have a junior engineer doing it, ’cause the accountable person and the human are all bundled into one block of cost, versus the complexity of that level five, where you need to get those [00:12:00] assurances together.
Stephanie Harris-Yee: The whole package of things.
Erik Vogt: Right. So, I mean, if I had to leave a tagline for this, the price really should follow the consequences of error. It’s not about stacking margin on top of costs. It’s about aligning the process to the right risk profile with the right combination of AI and human components that delivers the output that you want.
And I think that’s the key idea in there.
Stephanie Harris-Yee: Yeah. Okay. So to wrap it up, for the localization leader, what would be the takeaway that you would give?
Erik Vogt: Don’t build your AI strategy around the assumption that tokens will always get cheaper or more expensive. Like, both of those are kind of fallacies if you look at them in isolation. You build it around allocation, and you think about the cheapest model for the lowest-consequence workflow.
You’re optimizing around efficiency, so you have the fewest number of steps and the fewest number of checkpoints, right? Less governance, less overload. And then you use premium models, and this is, we didn’t even get to talk about this, but different [00:13:00] models cost different amounts also, and some of them are a lot cheaper than others per unit, and they have qualitatively and quantitatively different outputs.
So I think choosing your model and which version of those models, like, is it a thinking model or reasoning model versus something that’s a little bit more instant. So the premium models, the grounding, and then experts and where you put experts in there, that’s where you manage risk reduction and where you create value.
There’s opportunities to waste a lot of money by either paying for compute that isn’t adding value, or paying for human time that you don’t need. So it’s all about kinda calibrating your ecosystem to deliver these outcomes that you want.
The waste is using just the wrong level of assurance for a given job, and then putting everything in one pipeline and expecting everything to spit out the same results. Like, this one-size-fits-all mindset is just obsolete. We can’t afford that anymore.
Stephanie Harris-Yee: Yeah. Okay. [00:14:00] Thank you, Erik. And yeah, as always, great insights.
Erik Vogt: Thank you so much, Steph. As always, it’s a pleasure. Have a great day.
Argos Multilingual
15 min. read
If your engineering team ships on a continuous integration and continuous delivery (CI/CD) cadence, your localization program cannot run on quarterly batch handoffs. Every merged pull request can introduce or change user-facing strings, and a workflow that waits for a manual export, an email to a vendor, and a two-week turnaround will either block your […]





