InferenceBench
Roofline-Grounded LLM Benchmarks
Vendor-neutral inference benchmarks across H100, GB200, MI300X, TPU, and AWS Inferentia — roofline-grounded composite scores with reproducible pricing snapshots.
What it does
InferenceBench publishes apples-to-apples inference benchmarks across the full GPU and accelerator landscape, normalized against a roofline performance model and tied to live USD pricing.
Why it's different
- Roofline-grounded — every score is bound by memory-bandwidth and compute-FLOPS ceilings
- Composite scoring — latency, throughput, and $/token combined into a single comparable number
- Pricing automation — daily snapshots from every major cloud and GPU broker
- Reproducibility — full configs, model weights references, and run scripts published per result
Who uses it
Inference platform teams choosing accelerators, finance teams modelling token economics, and procurement teams negotiating capacity contracts.
Public + Enterprise tier
Public results free · API + private benchmarks from $1,499/month
6 regions
us-east-1, us-west-2, eu-west-1 +
Yobitel
AI Application