Simon WillisonProductsWednesday, May 20, 2026·2 min read

How fast is 10 tokens per second really?

AI Article Analysis

Large language models are increasingly measured by their token generation speed, a metric that determines how quickly an AI can produce responses. However, raw numbers like "30 tokens per second" offer little insight into the actual user experience. A newly created interactive tool by developer Mike Veerman provides a practical solution to this transparency gap, allowing users to visualize and understand what different token speeds actually feel like in real-world scenarios.

Veerman developed an HTML-based application that simulates token output speeds ranging from 5 tokens per second to 800 tokens per second. The tool enables users to observe text generation in real-time, bridging the gap between abstract performance metrics and tangible user experience. By allowing direct comparison across various speed benchmarks, the application helps consumers and developers make more informed decisions when evaluating AI models. The tool's source code is publicly available, encouraging community engagement and adaptation.

The implications of this tool extend across multiple dimensions of the AI industry:

Informed purchasing decisions: Organizations can now better understand what advertised speeds mean for their specific use cases and end-user satisfaction
Transparent marketing standards: The visualization approach encourages AI companies to be more forthright about performance capabilities rather than relying on opaque metrics
User experience optimization: Teams developing AI applications can use this reference point to understand latency expectations and optimize their systems accordingly
Skill assessment for developers: Engineers can calibrate their understanding of model performance, improving their ability to select appropriate models for different applications
Industry benchmarking clarity: The tool establishes a common reference point for discussing token speeds across the rapidly evolving AI landscape

Why this matters extends beyond mere curiosity. As large language models become increasingly central to business operations and consumer applications, understanding actual performance—not just advertised specifications—has become essential. Token speed directly impacts user satisfaction, application responsiveness, and overall system design. Veerman's tool democratizes this understanding, enabling stakeholders at all levels to make data-driven decisions about AI implementation and deployment. This transparency ultimately benefits the broader AI ecosystem by fostering more realistic expectations and supporting better technical choices.

Key Takeaways

Large language models are increasingly measured by their token generation speed, a metric that determines how quickly an AI can produce responses.
However, raw numbers like "30 tokens per second" offer little insight into the actual user experience.
A newly created interactive tool by developer Mike Veerman provides a practical solution to this transparency gap, allowing users to visualize and understand what different token speeds actually feel like in real-world scenarios.
Veerman developed an HTML-based application that simulates token output speeds ranging from 5 tokens per second to 800 tokens per second.

Read the full article on Simon Willison

Read on Simon Willison