TL;DR
- Cost-efficient TPU generation launched 2023 — pitched at inference and small-to-medium training.
- 197 TFLOPS BF16, 393 TOPS INT8, 16 GB HBM per chip; smaller than v5p but better $/throughput for inference.
- Pods scale to 256 chips with 2D-torus inter-chip interconnect (smaller than v4's 4,096-chip pods).
- Widely used by Anthropic, Character.AI, AssemblyAI and other Google Cloud TPU customers through 2023-2024.
Overview#
TPU v5e is Google's value-tier fifth-generation TPU. Launched 2023, it traded peak per-chip throughput and pod scale for a substantially better cost-per-inference profile. The 'e' nomenclature stands for 'efficiency' — same generation as v5p but with smaller dies and pods.
Through 2023-2024, v5e was the TPU customers reached for first. Inference of mid-sized models, cost-sensitive fine-tuning and most experimental workloads ran well within its envelope.
Specifications#
| Metric | TPU v5e (per chip) |
|---|---|
| BF16 | 197 TFLOPS |
| INT8 | 393 TOPS |
| Memory | 16 GB HBM |
| Memory bandwidth | 819 GB/s |
| Inter-chip link | ICI v3 (~64 GB/s per direction) |
| Pod scale | 256 chips (2D torus) |
| Fabric | 2D torus (no OCS) |
When to Pick v5e#
- Inference workloads on Google Cloud where TPU pricing is more attractive than GPU.
- Small and mid-sized fine-tuning on JAX or PyTorch/XLA.
- Cost-sensitive batch inference where pod-scale ICI bandwidth is not required.
- Pick v5p for very-large training that needs the larger pod and higher per-chip throughput.
- Pick Trillium (v6) for newer-generation parts where supply allows.
Pitfalls#
- 16 GB HBM per chip is modest; larger models require explicit sharding or quantisation.
- 2D torus (no OCS) is less flexible than v4/v5p fabrics.
- Google Cloud only — no portability beyond GCP.
- PyTorch/XLA performance lags JAX on TPU v5e.
Software Notes#
JAX + XLA is the high-performance path. MaxText, Pax and the broader Google open-source JAX ecosystem target v5e directly. Vertex AI inference endpoints support v5e backends for Google's foundation models.
References
- Google Cloud TPU v5e Documentation · Google Cloud