llm-benchlive

updated 2026-05-09T03:00:07Z
16
Models
922
Total Probes
922
Healthy
53
Errors

Time to First Token

Total Latency

Throughput (tok/s)

Statistics

Model TTFT p50 TTFT p95 Latency Tok/s Errors N
anthropic/claude-opus-4.6
1574ms 2055ms 1895ms 70.4 5 60
anthropic/claude-opus-4.7
1576ms 11789ms 1634ms 113.0 5 60
anthropic/claude-sonnet-4.6
1822ms 4493ms 2290ms 45.2 5 60
deepseek/deepseek-v3.2
1737ms 4685ms 2411ms 29.7 0 65
deepseek/deepseek-v4-flash
1246ms 4778ms 2376ms 73.9 5 60
deepseek/deepseek-v4-pro
0ms 4171ms 2331ms 10.3 0 65
google/gemini-2.0-flash-001
428ms 855ms 480ms 217.0 8 57
google/gemini-2.5-flash
428ms 875ms 472ms 224.9 5 60
google/gemini-2.5-flash-lite
314ms 490ms 372ms 195.3 5 60
openai/gpt-4o-mini
1127ms 1190ms 1225ms 102.5 0 3
openai/gpt-5.4
897ms 1375ms 1140ms 48.0 5 57
openai/gpt-5.5
1215ms 3565ms 1568ms 44.3 5 60
openai/gpt-oss-120b
0ms 1564ms 1080ms 20.9 5 60
x-ai/grok-4-fast
2668ms 10318ms 2790ms 1547.1 0 65
x-ai/grok-4.1-fast
2481ms 4200ms 2570ms 2705.8 0 65
x-ai/grok-4.3
3404ms 6081ms 3610ms 1182.5 0 65