* This blog post is a summary of this video.

Demystifying Tokens: How to Optimize Costs for AI Assistants

Table of Contents

What Are Tokens and How Do They Impact Costs

Tokens are the way that AI models like GPT or Claude price their costs. A token is about 4 characters or part of a word. What's more important is how tokens are used in pricing models.

Most people think the AI response is what's charged, but it's not entirely true. What goes into the prompt and model to generate the output is just as, if not more, important in determining costs.

There are two types of costs: input cost for input tokens, and output cost for output tokens. It's the combination of both that determines your total cost. Often, most tokens are used for input.

Input vs Output Tokens

Input is the combination of your question, context, prompt and other info sent to the AI. Output is the response the AI model generates. With complex questions, thousands of tokens may be used for input, while output is only a few hundred. It's important to understand input vs output tokens to optimize costs and get the best value.

Token Costs Vary By AI Model

Different AI models have different token costs. More advanced models cost more per token. Tools like Voiceflow allow switching between models and handle charging appropriately. You can use cheaper models for simple conversations, and more advanced models for complex questions, optimizing cost and accuracy.

Simple Prompt Example: Low Token Usage

A simple prompt like "You are a support rep" uses very few tokens. The input is the prompt text, and output is the AI response. With a cheap GPT-3.5 model, only 40 total tokens were used.

The input tokens were 26 to provide the prompt context, output tokens were 16 for the response. With the low-cost model, only 30 tokens were actually charged.

Complex Knowledge Base Example: High Token Usage

Optimizing the Chunk Limit

With a knowledge base, many more tokens are used. The question, prompt and injected text chunks all add to input tokens. Limiting chunks reduces accuracy, but saves tokens. Balance chunks to optimize accuracy and cost. In one example, reducing chunks from 5 to 1 saved nearly 1,000 tokens, but provided a less accurate response.

Choosing the Right AI Model

For a complex knowledge base query, an advanced model like GPT-4 provides more accurate responses, but costs more per token. The higher cost balances out if it satisfies users, saving human support costs. The total cost is just a few cents, vs. dollars for human support. Choose the right model for each conversation step to optimize cost and accuracy.

Balancing Cost, Accuracy and User Experience

There are several techniques to balance cost, accuracy and user experience:

  • Adjust input chunk limits based on query complexity

  • Choose different AI models for different steps

  • Optimize prompts to guide response length and style

  • Monitor token usage and tune as needed

Key Takeaways for Optimizing Your AI Assistant

The key takeaways are:

  • Input tokens often outnumber output tokens

  • Limit chunks and choose models carefully to control costs

  • Balance accuracy by tuning chunks, models and prompts

  • Monitor usage and continuously optimize your assistant

FAQ

Q: How are tokens calculated for AI assistants?
A: Tokens are calculated based on both the input text sent to the AI model and the output text received. Generally, input tokens make up the majority of costs.

Q: How can I reduce costs for my AI assistant?
A: Strategies include optimizing the chunk limit for knowledge bases, choosing less expensive AI models where accuracy permits, and balancing prompt complexity with output length.

Q: What affects the accuracy of my AI assistant?
A: Accuracy is impacted by factors like the complexity of the prompt, choice of underlying AI model, and amount of context provided from knowledge bases.