Google TPU v5e

TL;DR

Cost-efficient TPU generation launched 2023 — pitched at inference and small-to-medium training.
197 TFLOPS BF16, 393 TOPS INT8, 16 GB HBM per chip; smaller than v5p but better $/throughput for inference.
Pods scale to 256 chips with 2D-torus inter-chip interconnect (smaller than v4's 4,096-chip pods).
Widely used by Anthropic, Character.AI, AssemblyAI and other Google Cloud TPU customers through 2023-2024.

Overview#

TPU v5e is Google's value-tier fifth-generation TPU. Launched 2023, it traded peak per-chip throughput and pod scale for a substantially better cost-per-inference profile. The 'e' nomenclature stands for 'efficiency' — same generation as v5p but with smaller dies and pods.

Through 2023-2024, v5e was the TPU customers reached for first. Inference of mid-sized models, cost-sensitive fine-tuning and most experimental workloads ran well within its envelope.

Specifications#

Metric	TPU v5e (per chip)
BF16	197 TFLOPS
INT8	393 TOPS
Memory	16 GB HBM
Memory bandwidth	819 GB/s
Inter-chip link	ICI v3 (~64 GB/s per direction)
Pod scale	256 chips (2D torus)
Fabric	2D torus (no OCS)

When to Pick v5e#

Inference workloads on Google Cloud where TPU pricing is more attractive than GPU.
Small and mid-sized fine-tuning on JAX or PyTorch/XLA.
Cost-sensitive batch inference where pod-scale ICI bandwidth is not required.
Pick v5p for very-large training that needs the larger pod and higher per-chip throughput.
Pick Trillium (v6) for newer-generation parts where supply allows.

Pitfalls#

16 GB HBM per chip is modest; larger models require explicit sharding or quantisation.
2D torus (no OCS) is less flexible than v4/v5p fabrics.
Google Cloud only — no portability beyond GCP.
PyTorch/XLA performance lags JAX on TPU v5e.

Software Notes#

JAX + XLA is the high-performance path. MaxText, Pax and the broader Google open-source JAX ecosystem target v5e directly. Vertex AI inference endpoints support v5e backends for Google's foundation models.

References

Google Cloud TPU v5e Documentation · Google Cloud

Overview#

Through 2023-2024, v5e was the TPU customers reached for first. Inference of mid-sized models, cost-sensitive fine-tuning and most experimental workloads ran well within its envelope.

Metric

TPU v5e (per chip)

BF16

197 TFLOPS

INT8

393 TOPS

Memory

16 GB HBM

Memory bandwidth

819 GB/s

Inter-chip link

ICI v3 (~64 GB/s per direction)

Pod scale

256 chips (2D torus)

Fabric

2D torus (no OCS)

When to Pick v5e#

Inference workloads on Google Cloud where TPU pricing is more attractive than GPU.

Small and mid-sized fine-tuning on JAX or PyTorch/XLA.

Cost-sensitive batch inference where pod-scale ICI bandwidth is not required.

Pick v5p for very-large training that needs the larger pod and higher per-chip throughput.

Pick Trillium (v6) for newer-generation parts where supply allows.

Google TPU v5e

Overview#

Specifications#

When to Pick v5e#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel

Google TPU v5e

Overview#

Specifications#

When to Pick v5e#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel