Google TPU Trillium (v6e)

TL;DR

Sixth-generation TPU launched 2024 — emphasises per-watt efficiency for training and inference.
Roughly 4.7× the per-chip compute of TPU v5e in BF16 / INT8.
32 GB HBM per chip with substantially higher bandwidth than v5e.
Used for Gemini training and Vertex AI inference at scale.

Overview#

Trillium — formally TPU v6e — is Google's sixth-generation TPU. Announced in 2024 and rolled out through Google Cloud in late 2024 and 2025, it focuses on per-watt efficiency rather than peak single-chip throughput. The 'e' lineage continues: Trillium is the value/efficiency tier of v6 rather than the very-large-pod sibling.

Google has shifted increasingly toward TPUs as the substrate of its frontier AI work. Gemini 1.5 and beyond were trained on combinations of v5p and Trillium; Vertex AI inference endpoints increasingly use Trillium for cost-sensitive workloads.

Specifications#

Metric	Trillium (per chip)
BF16	~926 TFLOPS
INT8	~1,851 TOPS
Memory	32 GB HBM
Memory bandwidth	1.6 TB/s
Pod scale	256 chips per pod (multiple pods scale further)
Fabric	ICI

Trillium specifications are tied to Google Cloud product documentation, which is iterated; the figures here reflect documentation as of 2026 and the qualitative claim of ~4.7× v5e is the load-bearing one.

When to Pick Trillium#

Cost-sensitive inference on Google Cloud where Trillium pricing beats v5p / GPU.
Mid-scale training on JAX or PyTorch/XLA.
Workloads where per-watt efficiency dominates total cost of ownership.
Pick larger-pod successors when very-large training pod scale is needed.

Pitfalls#

Google Cloud only.
PyTorch/XLA still lags JAX, though the gap is narrowing.
Inference recipes need to be designed for the TPU memory model — naive GPU code rarely runs optimally.

Software Notes#

JAX + XLA remains the default. MaxText, Pax and the Hugging Face JAX integrations continue as the reference paths. Vertex AI offers managed Trillium endpoints for Google's foundation models.

References

Google Cloud TPU Trillium Announcement · Google Cloud

Overview#

Specifications#

Metric	Trillium (per chip)
BF16	~926 TFLOPS
INT8	~1,851 TOPS
Memory	32 GB HBM
Memory bandwidth	1.6 TB/s
Pod scale	256 chips per pod (multiple pods scale further)
Fabric	ICI

Google TPU Trillium (v6e)

Overview#

Specifications#

When to Pick Trillium#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel

Google TPU Trillium (v6e)

Overview#

Specifications#

When to Pick Trillium#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel