What Are API Tokens? A Plain-English Guide for Business Owners

If you have looked into AI agents for your business, you have probably run into the word "tokens" within the first five minutes. It sounds like jargon from a developer conference. In practice, it is one of the simplest concepts in AI, and understanding it will save you money and stop vendors from overcharging you.

This guide breaks down what tokens are, how pricing works across the major AI providers, what real business tasks actually cost, and how token prices have changed dramatically over the past two years.

What is a token?

A token is a chunk of text that an AI model reads or writes. It is not exactly a word and not exactly a character. Most modern language models use a system called byte-pair encoding (BPE) to split text into tokens. The rough rule of thumb: one token is about three-quarters of an English word, or about four characters.

Some concrete examples:

The word "hello" is one token.

The word "uncomfortable" is broken into two tokens: "uncomfort" and "able."

A number like "2026" is typically one token.

Punctuation marks are usually their own tokens.

The sentence "Can you book a meeting for Tuesday at 3pm?" is about 11 tokens.

If you want to see exactly how text gets tokenised, OpenAI has a free Tokenizer tool on their website where you can paste text and watch it split into tokens in real time.

Why does tokenisation matter for cost?

Because every AI provider charges per token. When your AI agent reads a customer email, that email is converted into tokens. When the agent writes a reply, those output tokens are also counted. Your bill is a direct function of how many tokens go in and how many come out.

Input tokens vs output tokens

This distinction matters because most providers charge different rates for each.

Input tokens are everything the AI reads: the customer's email, your business context, the instructions you have given the agent, any documents it needs to reference. Think of input tokens as the AI listening.

Output tokens are everything the AI writes back: the reply to that email, a summary of a document, a drafted invoice. Think of output tokens as the AI speaking.

Output tokens are typically more expensive than input tokens, often two to four times the price. This is because generating new text is computationally harder than reading existing text.

How context windows work

Every AI model has a context window, which is the maximum number of tokens it can hold in a single conversation or task. Think of it as the model's working memory.

GPT-4o has a 128,000-token context window, roughly equivalent to a 300-page book. See OpenAI's model documentation for current specifications.

Claude 3.5 Sonnet and Claude 3 Opus offer a 200,000-token context window. Anthropic details this on their model comparison page.

Gemini 1.5 Pro offers up to a 2,000,000-token context window, one of the largest available. Google documents this on their AI pricing page.

The context window includes both the input and the output. If you feed the model a 50,000-token document and ask it to write a 2,000-token summary, you have used 52,000 tokens of the context window.

For most business tasks, you will never come close to hitting these limits. A typical customer email exchange uses a few hundred tokens. Even a lengthy contract review might use 20,000 to 30,000 tokens.

How the major models price tokens

Pricing varies between providers and between models within the same provider. Here are the current rates for the most commonly used business-grade models as of early 2026:

OpenAI (GPT-4o)

Input: US$2.50 per 1 million tokens

Output: US$10.00 per 1 million tokens

Full pricing at platform.openai.com/docs/pricing

Anthropic (Claude 3.5 Sonnet)

Input: US$3.00 per 1 million tokens

Output: US$15.00 per 1 million tokens

Full pricing at anthropic.com/pricing

Google (Gemini 1.5 Pro)

Input: US$1.25 per 1 million tokens (for prompts up to 128k tokens)

Output: US$5.00 per 1 million tokens

Full pricing at ai.google.dev/pricing

What do these numbers actually mean?

One million tokens sounds abstract. In practical terms, one million tokens is roughly 750,000 words, or about ten full-length novels. That US$2.50 to US$3.00 input cost is the price of reading ten novels. For a business processing a few hundred emails a day, the monthly token cost is usually trivial.

Real cost calculations for common business tasks

Let us put actual dollar figures on the tasks a small Australian business might automate.

Email handling

An average business email is about 150 to 300 words, or roughly 200 to 400 tokens. A reply is usually a similar length. Including system instructions (the context your agent needs about your business), a single email read-and-reply cycle uses about 1,000 to 1,500 tokens total.

At GPT-4o rates, that is roughly:

1,000 input tokens: $0.0025

500 output tokens: $0.005

Total per email: about $0.007 (less than one cent)

If your business handles 50 emails a day, that is about $0.35 per day, or roughly $10 per month.

Document drafting

Drafting a one-page business letter from a brief set of instructions typically uses about 500 input tokens and 800 output tokens.

Cost per document: roughly $0.01 to $0.02

Drafting 20 documents a month: roughly $0.20 to $0.40

Call transcription and summarisation

A 10-minute phone call generates about 1,500 words of transcript, or roughly 2,000 tokens. Summarising that transcript produces another 300 to 500 output tokens.

Input (transcript + instructions): about 2,500 tokens ($0.006)

Output (summary): about 400 tokens ($0.004)

Total per call: roughly $0.01

If your business transcribes and summarises 30 calls a month, that is about $0.30 per month.

Adding it all together

A typical small business running an AI agent for email handling, document drafting, and call summaries might use:

These figures use mid-range GPT-4o pricing converted to AUD at roughly 1.55 AUD per USD. Your actual costs will vary based on the model you use and the complexity of your tasks, but the order of magnitude is right: tens of dollars per month, not hundreds.

How to monitor and control your spend

Every major AI provider gives you tools to manage costs:

Spending caps. OpenAI, Anthropic, and Google all let you set hard monthly limits on your API account. Set it to $30, $50, or whatever your comfort level is. When the cap is hit, API calls stop. No surprise bills.

Usage dashboards. Each provider has a dashboard showing your daily and monthly token usage broken down by model. Check it weekly for the first month or two until you understand your patterns.

Model selection. You do not always need the most powerful model. Many routine tasks (email classification, simple replies, data extraction) work perfectly well on cheaper models like GPT-4o-mini (US$0.15 per million input tokens) or Claude 3 Haiku (US$0.25 per million input tokens). Reserve the bigger models for complex reasoning tasks.

Prompt optimisation. The instructions you give your AI agent (called the system prompt) are sent with every request. A bloated 2,000-word system prompt adds unnecessary tokens to every single call. Keeping prompts concise can reduce costs by 20 to 40 percent.

How token costs compare to other business expenses

To put AI token costs in perspective for an Australian small business:

A single commercial-grade espresso machine lease: $150 to $300/month

EFTPOS terminal fees: $30 to $80/month

Xero accounting software: $54 to $78/month

A single day of a contractor at $80/hour: $640

AI agent token costs: typically $10 to $50/month

The token cost of running an AI agent is comparable to a single Xero subscription. For many businesses, it is the cheapest software line item on the books.

Token prices have dropped dramatically

One of the most important trends in AI is that token prices have been falling rapidly, and this trend shows no sign of stopping.

According to data compiled by a16z and reported widely across the industry:

In early 2024, GPT-4 Turbo cost US$10 per million input tokens. By mid-2025, GPT-4o was down to US$2.50 for the same quality tier, a 75% reduction in 18 months.

Anthropic's Claude 3 Opus launched in early 2024 at US$15 per million input tokens. By late 2025, Claude 3.5 Sonnet offered better performance at US$3 per million input tokens, an 80% reduction in effective cost per unit of intelligence.

Google's Gemini models have followed a similar trajectory, with Gemini 1.5 Flash offering competitive performance at US$0.075 per million input tokens.

ARK Invest's Big Ideas 2025 report projected that AI inference costs would continue to fall at roughly 50 to 70 percent per year, driven by hardware improvements, model efficiency gains, and competition between providers.

What this means for your business: the AI agent you deploy today will cost less to run next year, and even less the year after. The economics only improve over time.

Who do you actually pay?

You pay the AI provider directly. If your agent runs on GPT-4o, you pay OpenAI. If it runs on Claude, you pay Anthropic. You set up your own API account, you own the billing relationship, and you can switch providers at any time.

This is different from some AI platforms that mark up token costs by 300 to 500 percent and bundle them into a flat monthly fee. When you pay the provider directly, you get wholesale rates and full transparency.

The bottom line

Tokens are just the units AI models use to measure text. The costs are low, they are transparent, and they are falling every year. For most Australian small businesses, running an AI agent costs less per month than a single team lunch. Understanding tokens puts you in control of your AI costs instead of relying on a vendor to tell you what things should cost.