NVIDIA A10 Tensor Core GPU

TL;DR

Single-slot PCIe Ampere card aimed at enterprise inference, virtual workstations and modest fine-tuning.
24 GB GDDR6 at 600 GB/s in a 150 W passive form factor — drop-in for 1U/2U servers.
Strong fit for 7B-class LLM inference, image generation, and video/CV pipelines.
Supersedes T4 in most fleets; widely deployed before L4 took over the low-TDP slot.

Overview#

A10 is NVIDIA's retail counterpart to the A10G. Same Ampere GA102 die, slightly different binning, a 150 W TDP and a single-slot passive form factor designed to slot into standard 1U and 2U servers. Through 2022-2024 it was the default 'inference card' for enterprise deployments that did not need A100-class memory.

By 2026 A10 is being displaced in new deployments by L4 (lower TDP, similar throughput per watt) and L40S (substantially more capable for a similar slot envelope). It remains heavily deployed in existing fleets.

Specifications#

Metric	A10
Architecture	Ampere (GA102)
Process	Samsung 8N
FP32	31.2 TFLOPS
TF32 (Tensor, sparse)	125 TFLOPS
BF16 / FP16 (Tensor, sparse)	250 TFLOPS
INT8 (Tensor, sparse)	500 TOPS
Memory	24 GB GDDR6
Memory bandwidth	600 GB/s
TDP	150 W
Form factor	PCIe Gen4 x16, single-slot
NVLink	Not supported
Display outputs	None

Architecture Notes#

Same Ampere generation as A100 — third-generation Tensor Core with TF32, BF16 and INT8 — but no FP8, no HBM and no NVLink. The GA102 die gives more raster and graphics horsepower than GA100 but less tensor density per watt at the high end.

MIG is not supported. Multi-tenant deployments rely on time-slicing through CUDA MPS or on vGPU licensing.

When to Pick A10#

Drop-in inference upgrades over T4 fleets that need more memory or compute.
Virtual workstation hosts (Citrix, VMware Horizon) where NVENC/NVDEC are valuable.
Modest fine-tuning of 7B-class models with LoRA or QLoRA — 24 GB is workable.
Single-card servers where 150 W and single-slot are hard constraints.
Pick L4 instead for new builds under tight power budgets.
Pick L40S instead when 48 GB and Ada Tensor Core throughput justify the TDP increase.

Pitfalls#

Often confused with A10G; software guidance for one usually applies to the other but they are not identical bins.
GDDR6 bandwidth limits long-context inference and large-batch decode performance.
No NVLink means multi-A10 setups rely on PCIe — tensor parallel scaling is poor.
Power-efficiency-per-token is worse than L4 in many modern inference workloads.

Software Notes#

CUDA 11.x and 12.x both supported; the entire mainstream inference stack (vLLM, TensorRT-LLM, Triton, TGI, Ollama) treats A10 as a routine Ampere target. Quantisation paths (AWQ, GPTQ, INT8) are well-tuned. CUDA 13 retains support; treat A10 as stable through 2027.

References

NVIDIA A10 Datasheet · NVIDIA
Ampere Architecture Whitepaper · NVIDIA

Overview#

Metric

A10

Architecture

Ampere (GA102)

Process

Samsung 8N

FP32

31.2 TFLOPS

TF32 (Tensor, sparse)

125 TFLOPS

BF16 / FP16 (Tensor, sparse)

250 TFLOPS

INT8 (Tensor, sparse)

500 TOPS

Memory

24 GB GDDR6

Memory bandwidth

600 GB/s

TDP

150 W

Form factor

PCIe Gen4 x16, single-slot

NVLink

Not supported

Display outputs

None

Architecture Notes#

MIG is not supported. Multi-tenant deployments rely on time-slicing through CUDA MPS or on vGPU licensing.

When to Pick A10#

Drop-in inference upgrades over T4 fleets that need more memory or compute.

Virtual workstation hosts (Citrix, VMware Horizon) where NVENC/NVDEC are valuable.

Modest fine-tuning of 7B-class models with LoRA or QLoRA — 24 GB is workable.

Single-card servers where 150 W and single-slot are hard constraints.

Pick L4 instead for new builds under tight power budgets.

Pick L40S instead when 48 GB and Ada Tensor Core throughput justify the TDP increase.

Pitfalls#

Often confused with A10G; software guidance for one usually applies to the other but they are not identical bins.

GDDR6 bandwidth limits long-context inference and large-batch decode performance.

No NVLink means multi-A10 setups rely on PCIe — tensor parallel scaling is poor.

Power-efficiency-per-token is worse than L4 in many modern inference workloads.

NVIDIA A10 Tensor Core GPU

Overview#

Specifications#

Architecture Notes#

When to Pick A10#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel

NVIDIA A10 Tensor Core GPU

Overview#

Specifications#

Architecture Notes#

When to Pick A10#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel