Closed
Description
Initial Checks
- I confirm that I'm using the latest version of Pydantic AI
- I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue
Description
Usage for gemini models in stream mode is calculated incorrectly.
Request_tokens seems to be multiplied by number of chunks. As a result total_tokens is also too high.
Example Code
import asyncio
from pydantic_ai import Agent
async def main():
agent = Agent(
"google-gla:gemini-2.0-flash",
)
prompt = """return only "word_1 word_2 word_3 word_4 word_5 word_6 word_7 word_8 word_9 word_10" """
results = await agent.run(prompt)
print(f"Run usage:\n {results.usage()}")
async with agent.run_stream(prompt) as results:
chunks = len([chunk async for chunk in results.stream_text(debounce_by=None)])
print(f"Stream run usage ({chunks} chunks):\n {results.usage()}")
asyncio.run(main())
# Run usage:
# Usage(requests=1, request_tokens=36, response_tokens=32, total_tokens=68, details=None)
# Stream run usage (4 chunks):
# Usage(requests=1, request_tokens=147, response_tokens=32, total_tokens=179, details=None)
Python, Pydantic AI & LLM client version
python 3.13.2
pydantic-ai 0.2.4