Over ons 🤖

Laten we elkaar leren kennen

Vertel me de missie en visie

Leg het verhaal achter Mach8 uit

Hallo daar 👋

Hoe kunnen we je helpen?

Mijn gegevens mogen worden gebruikt om me op de hoogte te houden van relevant nieuws van Mach8

AI Tools & Technology·6 min·4 May 2025

What is token usage and how does it affect AI costs?

Almost all API-based AI services charge based on tokens. But what exactly are tokens? And how do you ensure costs do not unexpectedly escalate when you scale your AI usage? This article explains the basics and offers concrete tips for cost management.

Tokens are the unit of measurement for AI language models. Every word, every punctuation mark and every space a model processes or produces costs tokens. Understanding how tokens work also helps you understand why some AI applications are cheaper than others, and how you can manage costs.

What are tokens?

A token is not a word and not a letter. It is a piece of text that the model processes as a unit. In English, most short words are one token. Longer words are split into multiple tokens. Punctuation marks and spaces are also tokens.

As a rule of thumb: 100 words of English text are approximately 130-150 tokens. Dutch or other European texts are generally slightly more expensive per word than English, because the tokeniser most models use is optimised for English.

Input versus output tokens

AI models charge for two streams: input and output. Input is the tokens you send to the model: the system prompt, the conversation history and the user's current message. Output is the tokens the model sends back as a response.

Output tokens typically cost two to five times more than input tokens. That makes longer answers relatively expensive. If you have a chatbot that gives extensive answers, you pay significantly more than a chatbot that responds concisely.

What makes costs high?

The most costly situations:

  • Long system prompts: If your chatbot receives an extensive set of instructions with each conversation, that counts as input tokens with every request.
  • Extensive conversation history: If the system sends the full conversation history, the input grows with each message.
  • Long documents in context: In RAG systems, relevant passages are sent as context. More passages means more tokens.
  • High output: Models that generate extensive answers produce many output tokens.

How do you manage costs?

Use smaller models for simple tasks: Claude Haiku, GPT-4o mini and similar compact models are a fraction of the price of the most powerful variants. For FAQ chatbots and simple tasks, that is more than sufficient.

Limit conversation history: Do not send the full conversation history when it is not needed. A summary of earlier messages instead of the literal text saves tokens.

Compress your system prompt: Test whether a shorter, less extensive system prompt works just as well. Every token in the system prompt counts with every request.

Use prompt caching: Anthropic and OpenAI both offer forms of caching where repeated input is processed more cheaply. This is relevant if you have a long system prompt that is the same with every request.

Estimating costs for a project

Before building an AI application, it is wise to make a cost estimate. How many conversations do you expect per day? What is the average length of a conversation? How long are your system prompt and context?

With those inputs you can calculate how many tokens you consume per day and what that costs. Most providers have a tokeniser tool where you can enter text and see the token count.

Conclusion

Understanding token usage is essential for anyone building AI applications at scale. Costs are manageable with the right architectural choices. Mach8 helps organisations design AI systems that not only work well, but are also cost-efficient.

Want to build a cost-efficient AI application? Get in touch with Mach8.

Ready to apply AI?

We help you go from strategy to implementation. Schedule a no-obligation call.

Schedule a call