This page displays performance metrics collected from LiteLLM model endpoints. Two test prompts are used: simple-prompt (single-word response testing baseline latency) and complex-prompt (multi-step planning task for reasoning evaluation). Charts show latency, token consumption, and cost data across models.
Loading data...
Complete time from request to full response. Lower is better for overall speed.
Scatter plot comparing speed vs resource usage. Models in the bottom-left are fastest and most efficient.