TL;DR
- Variant of A10 commissioned by AWS for g5 EC2 instances — same GA102 silicon, slightly different binning and clocks.
- 24 GB GDDR6 at 600 GB/s; 250 W passive single-slot card targeted at inference, graphics and light fine-tuning.
- Strong cost/performance for 7B-class inference, video transcoding and Stable Diffusion XL serving.
- Effectively AWS-exclusive in cloud — non-AWS deployments typically standardise on A10 PCIe instead.
Overview#
The A10G is the GPU that ships in AWS g5 instances — from g5.xlarge through g5.48xlarge. It is a derivative of the desktop GA102 die (the same silicon as the RTX 3080/3090 family) configured for data centre service: passive cooling, ECC GDDR6, a 250 W envelope and AWS-specific firmware.
It is not strictly the same as the retail A10 PCIe card NVIDIA sells through OEMs. AWS commissioned the A10G binning and BIOS specifically for g5, and the two parts are close but not identical. For practical purposes — workload selection, software stack, performance shape — they are interchangeable.
Specifications#
| Metric | A10G |
|---|---|
| Architecture | Ampere (GA102) |
| Process | Samsung 8N |
| FP32 | 31.2 TFLOPS |
| TF32 (Tensor, sparse) | 125 TFLOPS |
| BF16 / FP16 (Tensor, sparse) | 250 TFLOPS |
| INT8 (Tensor, sparse) | 500 TOPS |
| Memory | 24 GB GDDR6 |
| Memory bandwidth | 600 GB/s |
| TDP | 250 W |
| Form factor | PCIe Gen4 x16, single-slot |
| NVLink | Not supported |
A10G lacks NVLink. Multi-GPU communication on g5 instances goes over PCIe Gen4 — fine for inference replicas, limiting for tensor-parallel training.
Architecture and Differences From A100#
The A10/A10G uses GA102, not GA100. The two dies share the Ampere generation but are tuned for different workloads. GA102 has more SMs at lower precision-per-SM, GDDR6 instead of HBM2e, and no Tensor Memory Accelerator. The result is a card that excels at modest-batch inference and graphics workloads but underperforms A100 on memory-bound training.
There is no MIG support and no FP8. Multi-tenant isolation on A10G relies on time-slicing or vGPU rather than hardware partitioning.
When to Pick A10G (or A10)#
- Inference of 7B-class models in FP16 or quantised INT8, where 24 GB is sufficient.
- Stable Diffusion XL / image generation backends where GDDR6 bandwidth suits the access pattern.
- Video transcoding and computer-vision pipelines (NVENC/NVDEC are present and capable).
- Cost-sensitive batch inference where A100 capacity is over-spec.
- On AWS, A10G is often the right default for any 7B-class deployment under moderate QPS.
- Pick L4 if power and density matter more than peak throughput.
- Pick A100 / H100 if you need 70B+ models or training, not just inference.
Pitfalls#
- Confusion with A10: A10G is AWS-specific; documents written for retail A10 generally apply, but driver and CUDA support paths differ subtly.
- GDDR6, not HBM: memory-bandwidth-bound workloads (long context, large KV cache) underperform A100 by a wider margin than FLOPS comparisons suggest.
- PCIe-only multi-GPU: scaling beyond one card per instance is bandwidth-limited; 7B is usually the sweet spot ceiling.
- 24 GB is enough for 7B in FP16, but 13B in FP16 will not fit — quantisation or model offload is required.
Software Notes#
A10G is a first-class CUDA target — same driver, CUDA, cuDNN, TensorRT and Triton support as any other Ampere consumer-derived GPU. vLLM, TGI, SGLang and Ollama all run on A10G unchanged. AWQ and GPTQ quantisation paths fit 13B-class models into 24 GB acceptably for many workloads.