TL;DR
- Sixth-generation TPU launched 2024 — emphasises per-watt efficiency for training and inference.
- Roughly 4.7× the per-chip compute of TPU v5e in BF16 / INT8.
- 32 GB HBM per chip with substantially higher bandwidth than v5e.
- Used for Gemini training and Vertex AI inference at scale.
Overview#
Trillium — formally TPU v6e — is Google's sixth-generation TPU. Announced in 2024 and rolled out through Google Cloud in late 2024 and 2025, it focuses on per-watt efficiency rather than peak single-chip throughput. The 'e' lineage continues: Trillium is the value/efficiency tier of v6 rather than the very-large-pod sibling.
Google has shifted increasingly toward TPUs as the substrate of its frontier AI work. Gemini 1.5 and beyond were trained on combinations of v5p and Trillium; Vertex AI inference endpoints increasingly use Trillium for cost-sensitive workloads.
Specifications#
| Metric | Trillium (per chip) |
|---|---|
| BF16 | ~926 TFLOPS |
| INT8 | ~1,851 TOPS |
| Memory | 32 GB HBM |
| Memory bandwidth | 1.6 TB/s |
| Pod scale | 256 chips per pod (multiple pods scale further) |
| Fabric | ICI |
Trillium specifications are tied to Google Cloud product documentation, which is iterated; the figures here reflect documentation as of 2026 and the qualitative claim of ~4.7× v5e is the load-bearing one.
When to Pick Trillium#
- Cost-sensitive inference on Google Cloud where Trillium pricing beats v5p / GPU.
- Mid-scale training on JAX or PyTorch/XLA.
- Workloads where per-watt efficiency dominates total cost of ownership.
- Pick larger-pod successors when very-large training pod scale is needed.
Pitfalls#
- Google Cloud only.
- PyTorch/XLA still lags JAX, though the gap is narrowing.
- Inference recipes need to be designed for the TPU memory model — naive GPU code rarely runs optimally.
Software Notes#
JAX + XLA remains the default. MaxText, Pax and the Hugging Face JAX integrations continue as the reference paths. Vertex AI offers managed Trillium endpoints for Google's foundation models.
References
- Google Cloud TPU Trillium Announcement · Google Cloud