Total Cost of Ownership for AI Infrastructure

TL;DR

Total Cost of Ownership (TCO) is the full lifecycle cost of an infrastructure decision over a defined horizon — typically three years for general AI infrastructure, five years for sovereign builds with longer asset-life expectations.
Honest TCO covers six buckets: capital expenditure, operating expenditure, people, opportunity cost of capital, opportunity cost of idle capacity, and decommissioning. Sticker GPU-hour price is typically only 30-50% of the total.
Build wins above roughly 70% sustained utilisation over the horizon; cloud or neocloud wins below. Reserved-versus-on-demand-versus-spot is a separate axis layered on top.
Yobitel Omniscient Compute ranks every indexed provider by normalised TCO — not just headline GPU-hour rate — so customers comparing Yobitel NeoCloud reservations against hyperscaler reservations and on-prem builds see the side-by-side numbers in one view.
This entry helps you build a defensible TCO model for an AI-infrastructure decision, decide between build / hyperscaler / neocloud / hybrid, and read Omniscient Compute and Yobitel NeoCloud reservation pricing in the same lens you use for your existing estate.

Overview

Total Cost of Ownership is the discipline of summing every cost an infrastructure decision creates over its useful life, then comparing decisions on like-for-like totals rather than on advertised unit prices. The temptation — especially when comparing a cloud GPU-hour rate to an on-prem GPU server quote — is to compare only the headline numbers: $11.80/GPU/hr against $35,000 per server. That comparison is always wrong, because the on-prem number excludes most of the cost (power, cooling, networking, facility, people, idle utilisation, decommissioning) and the cloud number excludes none.

A defensible TCO model spans six buckets: capital expenditure (the kit), operating expenditure (power, cooling, connectivity, real estate, software), people (fully-loaded SRE, platform, network, security and procurement headcount), opportunity cost of capital (the discount rate against tied-up capex), opportunity cost of idle capacity (the hours you provisioned but did not consume), and decommissioning (decommission, secure data erasure, disposal). AI infrastructure pushes every bucket harder than general IT because power density is higher, refresh cadence is shorter, and idle GPUs are more expensive per hour than idle CPUs.

Yobitel uses TCO as a first-class ranking dimension across Omniscient Compute (the public-facing index of every relevant GPU provider) and the Yobitel NeoCloud commercial surface. Customers comparing a Yobitel NeoCloud reservation against a hyperscaler reservation see normalised three-year and five-year TCO side-by-side, not just headline rates — because at 70%+ sustained utilisation the headline rate is the least informative number on the page.

This entry helps you build a defensible TCO model for an AI-infrastructure decision (build, hyperscaler, neocloud, hybrid), pick the right horizon, account for the buckets most teams forget, and read Yobitel Omniscient Compute rankings and Yobitel NeoCloud reservation pricing in the same lens you apply to the rest of your portfolio.

How TCO works — the six-bucket model

Every well-formed AI-infrastructure TCO decomposes into the six buckets below. The exact ratios shift with utilisation, horizon and architecture choice — but the buckets themselves rarely change.

Bucket	Typical line items	Share of 3-yr TCO at 70% util
Capex — compute	GPU servers, head nodes, management switches, in-rack PDUs, racks. NVIDIA HGX H100/H200/B200 8-GPU servers run roughly $250,000 — $400,000 per node at 2026 pricing depending on generation and integrator.	~30-40%
Capex — fabric and storage	Spine/leaf switches, NDR InfiniBand or 800GbE Ethernet, optics, cabling, OOB. Parallel filesystem (WEKA, Lustre, VAST), tiered NVMe and HDD, backup targets.	~10-15%
Capex — facility (if owned)	Build-out or fit-out — MV power distribution, transformers, UPS, generator, chiller plant or CDU, fire suppression. Often pushed to colocation opex instead.	~5-15% (if self-built)
Opex — power and cooling	Energy for IT load multiplied by PUE. Cooling-plant water, refrigerant, CDU consumables. A B200 rack at 140 kW costs $120,000 — $200,000 per rack per year in electricity alone at typical UK industrial rates.	~15-25%
Opex — facility and connectivity	Colocation rent, business rates, insurance, transit, peering, dark fibre, DDoS, security.	~5-10%
Opex — software and tooling	Kubernetes distribution, GPU operator, observability (Prometheus + Grafana + Loki + Tempo), vulnerability scanning, identity, FinOps platform, CI/CD.	~3-7%
People	Fully-loaded platform engineering, SRE, network, security, datacentre and procurement headcount. Typically 4-10 FTE for a 1,000-GPU estate at production maturity.	~10-20%
Opportunity cost — capital	Cost of capital tied up in depreciating assets at the firm's discount rate (commonly 8-12% per annum).	~3-8%
Opportunity cost — idle capacity	GPU-hours provisioned but not consumed. At 60% utilisation, 40% of every depreciation dollar produces nothing.	Scales with (100% - utilisation)
Decommissioning	Decommission labour, secure data erasure (often a sovereign-residency requirement), responsible disposal or resale.	~1-3%

Tip: If a TCO model does not include idle-capacity opportunity cost, it is overstating build economics. A 4,096-GPU cluster at 60% utilisation has 1,638 GPUs depreciating, drawing standby power and consuming rack space without producing output — and that loss is real money, not an accounting artefact.

Power and cooling — the bucket most models underweight

AI infrastructure shifted power density an order of magnitude in two GPU generations. An H100 SXM module draws up to 700 W under load; an H200 module similar; a B200 module around 1,000 W; a B300 module around 1,200 W. An 8-GPU H100 HGX server with CPUs, NICs and storage draws ~10.2 kW; an 8-GPU B200 HGX equivalent can exceed 14 kW; a GB200 NVL72 rack pulls roughly 120 kW. Compare that to the 20 kW per rack most legacy enterprise datacentres were built for — five to seven times the design density.

Power-Usage Effectiveness (PUE) determines how much extra you pay for cooling, lighting and overhead on top of IT load. A modern direct-to-chip liquid-cooled AI facility runs at PUE 1.10 — 1.20; a legacy air-cooled facility can be PUE 1.50 — 1.80. Over a three-year horizon at sustained 70% utilisation, the PUE delta alone can shift TCO by 15-25%. UK industrial electricity prices around $0.18 — $0.24/kWh make the PUE delta worth six figures per rack per year at AI density.

Yobitel NeoCloud's facility footprint runs PUE 1.15 across the UK, EU and US-east footprint, with direct-to-chip liquid cooling on H100/H200/B200 racks and rear-door heat-exchangers on H100 PCIe nodes. The implication for the customer-facing TCO model is that the per-GPU-hour rate Omniscient Compute publishes for Yobitel NeoCloud already bundles the PUE delta — there is no separate cooling line to add.

Variants — when to apply each TCO horizon

Three years is the most common modelling horizon for general AI infrastructure; it matches the aggressive depreciation schedule most CFOs use for accelerators. Five years is the standard for sovereign and regulated workloads where asset disposal is heavily constrained and the customer expects the kit to outlive a hardware generation. Seven years is unrealistic for current-generation accelerators; thermal stress alone makes the failure curve unfavourable past five.

3-year horizon — Standard for general-purpose AI infrastructure. Matches typical straight-line depreciation. The horizon hyperscaler Reserved Instances align to.
5-year horizon — Standard for sovereign builds (UK NCSC OFFICIAL workloads under multi-year compliance scope, EU Data Boundary residency contracts). Yobitel UK Sovereign reservations align to this horizon by default.
Per-workload horizon — A foundation-model training run that lasts six weeks should be modelled at six weeks, not three years. Match the horizon to the decision being made.
Match horizons across alternatives — Comparing a 3-year on-prem build to a 1-year cloud spend is the most common TCO error. Build the on-demand and reservation alternatives at the same horizon as the build before comparing.

Build vs hyperscaler vs neocloud vs hybrid — when each wins

There is no universal answer to the build-versus-rent question. The answer depends on utilisation, horizon, access to capital, access to power and skilled operations talent, and whether sovereign-residency constraints rule alternatives out. The summary below maps where each option typically wins on a normalised TCO basis.

Mode	Wins when	Typical 3-yr H100 effective USD/hr
On-prem build (owned)	Sustained utilisation > 70%, capex available, 3+ year workload, power and ops skill secured.	$1.80 — $2.40 fully-loaded (excl. idle).
Colo + bought kit	Same as build but power and physical security are outsourced. Common steady state for mid-size AI teams.	$2.10 — $2.80 fully-loaded.
Neocloud reserved (e.g. Yobitel NeoCloud)	Sustained 40-70% utilisation, no appetite for capex, want sovereign-residency by default.	$2.20 — $2.80 effective.
Neocloud on-demand	Variable 20-40% utilisation, want price-per-hour transparency, want no commitment.	$3.20 — $4.50 effective.
Hyperscaler reserved (1yr or 3yr)	Workload tied to broader hyperscaler estate; need managed services adjacency.	$4.20 — $6.50 effective.
Hyperscaler on-demand	Bursty, experimental, short-lived. Provisioning agility outweighs unit price.	$7.20 — $11.80 effective.
Spot / preemptible	Fault-tolerant training with checkpointing, batch inference. Tolerates interruption.	$1.50 — $4.00 effective when available.
Hybrid (the common steady state)	Baseline on reserved / on-prem, middle on neocloud on-demand, top of curve on spot or hyperscaler burst.	Blended; the dominant production pattern.

Warning: Idle capacity is the silent TCO killer. A GPU you bought and did not use still depreciates, still draws standby power, and still occupies rack space that costs rent. The crossover where build beats neocloud is governed by sustained utilisation, not by peak utilisation, and not by hopeful utilisation.

Trade-offs and known limitations

TCO modelling has real limits, and serious practitioners are explicit about them. The model is only ever as defensible as the assumptions feeding it; the most common failures are listed below.

Utilisation forecasts are aspirational. The realistic utilisation of a new AI cluster in year one is rarely above 50%; ramp to 70-80% only happens with mature scheduling, MIG slicing, and elastic workloads. Model the ramp explicitly, do not assume steady-state from day one.
Hardware generation risk dominates 5-year models. A 3-year H100 build that is forced to refresh to B300 at year 2.5 because the workload mix shifted writes off significant remaining capex. Stress-test the model against an early-refresh scenario.
Power availability has become a binding constraint in many UK and US markets. A model that assumes incremental capacity is buyable at $0.20/kWh is wrong in any market where the grid connection itself is the gating factor.
Sovereign-residency carries TCO premium that is not always quantified. UK NCSC OFFICIAL or EU Data Boundary capacity is typically 15-25% more expensive than comparable hyperscaler regions; that premium is usually defensible against the alternative of failing a compliance audit ([[ncsc-cloud-security-principles]], [[gdpr-article-32]]).
People costs scale non-linearly with cluster size. The first 256 GPUs need roughly the same SRE headcount as the next 1,000. Linear-scaling people assumptions overestimate the per-GPU cost at scale and underestimate it at small scale.
Software opex is rising as commercial Kubernetes distributions, observability platforms and FinOps tools increasingly charge per node or per GPU. Model these as variable-with-fleet, not as flat overhead.

Practical implementation notes — how to actually build the model

A defensible TCO spreadsheet (or Bazel/Python rollup, or warehouse view fed from FOCUS) follows the structure below. The same structure works for one-off business-case analysis and for the standing FinOps dashboard.

Start with utilisation. Pick a sustained-utilisation forecast and a low/expected/high scenario range. Every other number flexes against this.
Sum capex bucket by bucket. Use real integrator quotes for GPU servers, not list price — discounts of 15-30% off NVIDIA list are typical at quantity.
Amortise capex straight-line over the horizon. Pair with an opportunity-cost-of-capital line at the firm's discount rate.
Power and cooling — multiply IT load by PUE by USD/kWh by 8,766 hours per year. Use a sensitivity range on USD/kWh.
People — fully load (salary + employer NI/payroll tax + benefits + tooling + overheads, typically 1.4x — 1.7x salary). Allocate fractional headcount honestly across multiple clusters.
Idle-capacity penalty — express as (1 - utilisation) x amortised capex per period. This is the line most models omit and the line that flips build vs neocloud decisions.
Compare against neocloud and hyperscaler effective rates pulled from FOCUS EffectiveCost ([[focus-spec]]) — not list rates. Yobitel Omniscient Compute publishes normalised TCO rankings already; cross-reference your spreadsheet against the index.
Sensitivity analysis — tornado chart on utilisation, USD/kWh, and discount rate. Decisions that hold across all three sensitivities are robust; decisions that only hold in the optimistic corner are not.

Where TCO sits in the Yobitel stack

Yobitel treats TCO as a published, normalised number across both Omniscient Compute (the public index) and the Yobitel NeoCloud commercial surface. Omniscient Compute ranks every indexed provider not only by current GPU-hour rate but by three-year normalised TCO with PUE, fabric, storage, sovereign-region premium and operational support folded in — so customers comparing Yobitel NeoCloud reservations against AWS P5 reservations, Azure ND H100 v5 reservations, or a build at a UK colocation see the same lens applied to each.

Yobitel NeoCloud's reservation instrument behaves like a FOCUS Savings Plan — see [[savings-plans]] — committing to a compute envelope rather than per-SKU. Customers commit at the dollar-per-hour level across H100, H200 and B200 capacity; the TCO model rolls up to a single effective rate that already includes power at UK industrial rates, PUE 1.15 cooling, NDR InfiniBand or 800GbE fabric, parallel filesystem storage, 24x7 SRE coverage and sub-processor transparency. There is no separate cooling, network or operations line for the customer to model.

Where customers prefer a build path, Yobitel Professional Services delivers TCO modelling and procurement support; where they want a hybrid, Managed Operations runs the 24x7 NOC over the customer-owned estate while Omniscient Compute keeps the FOCUS-conformant cost layer ([[focus-spec]]) unified across in-house, neocloud and hyperscaler footprints. Together this means the build-vs-buy decision can be revisited continuously against the actual numbers, not against the assumptions that were defensible eighteen months ago.

References

FinOps Foundation — Quantify Business Value capability · FinOps Foundation
Uptime Institute — Global Data Center Survey · Uptime Institute
AWS Pricing — EC2 P5 instances · AWS
Azure ND H100 v5 series · Microsoft Learn
NVIDIA HGX H100 / H200 / B200 reference design · NVIDIA

TL;DR

Total Cost of Ownership (TCO) is the full lifecycle cost of an infrastructure decision over a defined horizon — typically three years for general AI infrastructure, five years for sovereign builds with longer asset-life expectations.
Honest TCO covers six buckets: capital expenditure, operating expenditure, people, opportunity cost of capital, opportunity cost of idle capacity, and decommissioning. Sticker GPU-hour price is typically only 30-50% of the total.
Build wins above roughly 70% sustained utilisation over the horizon; cloud or neocloud wins below. Reserved-versus-on-demand-versus-spot is a separate axis layered on top.
Yobitel Omniscient Compute ranks every indexed provider by normalised TCO — not just headline GPU-hour rate — so customers comparing Yobitel NeoCloud reservations against hyperscaler reservations and on-prem builds see the side-by-side numbers in one view.
This entry helps you build a defensible TCO model for an AI-infrastructure decision, decide between build / hyperscaler / neocloud / hybrid, and read Omniscient Compute and Yobitel NeoCloud reservation pricing in the same lens you use for your existing estate.

Overview

How TCO works — the six-bucket model

Every well-formed AI-infrastructure TCO decomposes into the six buckets below. The exact ratios shift with utilisation, horizon and architecture choice — but the buckets themselves rarely change.

Bucket	Typical line items	Share of 3-yr TCO at 70% util
Capex — compute	GPU servers, head nodes, management switches, in-rack PDUs, racks. NVIDIA HGX H100/H200/B200 8-GPU servers run roughly $250,000 — $400,000 per node at 2026 pricing depending on generation and integrator.	~30-40%
Capex — fabric and storage	Spine/leaf switches, NDR InfiniBand or 800GbE Ethernet, optics, cabling, OOB. Parallel filesystem (WEKA, Lustre, VAST), tiered NVMe and HDD, backup targets.	~10-15%
Capex — facility (if owned)	Build-out or fit-out — MV power distribution, transformers, UPS, generator, chiller plant or CDU, fire suppression. Often pushed to colocation opex instead.	~5-15% (if self-built)
Opex — power and cooling	Energy for IT load multiplied by PUE. Cooling-plant water, refrigerant, CDU consumables. A B200 rack at 140 kW costs $120,000 — $200,000 per rack per year in electricity alone at typical UK industrial rates.	~15-25%
Opex — facility and connectivity	Colocation rent, business rates, insurance, transit, peering, dark fibre, DDoS, security.	~5-10%
Opex — software and tooling	Kubernetes distribution, GPU operator, observability (Prometheus + Grafana + Loki + Tempo), vulnerability scanning, identity, FinOps platform, CI/CD.	~3-7%
People	Fully-loaded platform engineering, SRE, network, security, datacentre and procurement headcount. Typically 4-10 FTE for a 1,000-GPU estate at production maturity.	~10-20%
Opportunity cost — capital	Cost of capital tied up in depreciating assets at the firm's discount rate (commonly 8-12% per annum).	~3-8%
Opportunity cost — idle capacity	GPU-hours provisioned but not consumed. At 60% utilisation, 40% of every depreciation dollar produces nothing.	Scales with (100% - utilisation)
Decommissioning	Decommission labour, secure data erasure (often a sovereign-residency requirement), responsible disposal or resale.	~1-3%

Tip: If a TCO model does not include idle-capacity opportunity cost, it is overstating build economics. A 4,096-GPU cluster at 60% utilisation has 1,638 GPUs depreciating, drawing standby power and consuming rack space without producing output — and that loss is real money, not an accounting artefact.

Power and cooling — the bucket most models underweight

Variants — when to apply each TCO horizon

3-year horizon — Standard for general-purpose AI infrastructure. Matches typical straight-line depreciation. The horizon hyperscaler Reserved Instances align to.
5-year horizon — Standard for sovereign builds (UK NCSC OFFICIAL workloads under multi-year compliance scope, EU Data Boundary residency contracts). Yobitel UK Sovereign reservations align to this horizon by default.
Per-workload horizon — A foundation-model training run that lasts six weeks should be modelled at six weeks, not three years. Match the horizon to the decision being made.
Match horizons across alternatives — Comparing a 3-year on-prem build to a 1-year cloud spend is the most common TCO error. Build the on-demand and reservation alternatives at the same horizon as the build before comparing.

Build vs hyperscaler vs neocloud vs hybrid — when each wins

Mode	Wins when	Typical 3-yr H100 effective USD/hr
On-prem build (owned)	Sustained utilisation > 70%, capex available, 3+ year workload, power and ops skill secured.	$1.80 — $2.40 fully-loaded (excl. idle).
Colo + bought kit	Same as build but power and physical security are outsourced. Common steady state for mid-size AI teams.	$2.10 — $2.80 fully-loaded.
Neocloud reserved (e.g. Yobitel NeoCloud)	Sustained 40-70% utilisation, no appetite for capex, want sovereign-residency by default.	$2.20 — $2.80 effective.
Neocloud on-demand	Variable 20-40% utilisation, want price-per-hour transparency, want no commitment.	$3.20 — $4.50 effective.
Hyperscaler reserved (1yr or 3yr)	Workload tied to broader hyperscaler estate; need managed services adjacency.	$4.20 — $6.50 effective.
Hyperscaler on-demand	Bursty, experimental, short-lived. Provisioning agility outweighs unit price.	$7.20 — $11.80 effective.
Spot / preemptible	Fault-tolerant training with checkpointing, batch inference. Tolerates interruption.	$1.50 — $4.00 effective when available.
Hybrid (the common steady state)	Baseline on reserved / on-prem, middle on neocloud on-demand, top of curve on spot or hyperscaler burst.	Blended; the dominant production pattern.

Warning: Idle capacity is the silent TCO killer. A GPU you bought and did not use still depreciates, still draws standby power, and still occupies rack space that costs rent. The crossover where build beats neocloud is governed by sustained utilisation, not by peak utilisation, and not by hopeful utilisation.

Trade-offs and known limitations

TCO modelling has real limits, and serious practitioners are explicit about them. The model is only ever as defensible as the assumptions feeding it; the most common failures are listed below.

Utilisation forecasts are aspirational. The realistic utilisation of a new AI cluster in year one is rarely above 50%; ramp to 70-80% only happens with mature scheduling, MIG slicing, and elastic workloads. Model the ramp explicitly, do not assume steady-state from day one.
Hardware generation risk dominates 5-year models. A 3-year H100 build that is forced to refresh to B300 at year 2.5 because the workload mix shifted writes off significant remaining capex. Stress-test the model against an early-refresh scenario.
Power availability has become a binding constraint in many UK and US markets. A model that assumes incremental capacity is buyable at $0.20/kWh is wrong in any market where the grid connection itself is the gating factor.
Sovereign-residency carries TCO premium that is not always quantified. UK NCSC OFFICIAL or EU Data Boundary capacity is typically 15-25% more expensive than comparable hyperscaler regions; that premium is usually defensible against the alternative of failing a compliance audit ([[ncsc-cloud-security-principles]], [[gdpr-article-32]]).
People costs scale non-linearly with cluster size. The first 256 GPUs need roughly the same SRE headcount as the next 1,000. Linear-scaling people assumptions overestimate the per-GPU cost at scale and underestimate it at small scale.
Software opex is rising as commercial Kubernetes distributions, observability platforms and FinOps tools increasingly charge per node or per GPU. Model these as variable-with-fleet, not as flat overhead.

Practical implementation notes — how to actually build the model

Start with utilisation. Pick a sustained-utilisation forecast and a low/expected/high scenario range. Every other number flexes against this.
Sum capex bucket by bucket. Use real integrator quotes for GPU servers, not list price — discounts of 15-30% off NVIDIA list are typical at quantity.
Amortise capex straight-line over the horizon. Pair with an opportunity-cost-of-capital line at the firm's discount rate.
Power and cooling — multiply IT load by PUE by USD/kWh by 8,766 hours per year. Use a sensitivity range on USD/kWh.
People — fully load (salary + employer NI/payroll tax + benefits + tooling + overheads, typically 1.4x — 1.7x salary). Allocate fractional headcount honestly across multiple clusters.
Idle-capacity penalty — express as (1 - utilisation) x amortised capex per period. This is the line most models omit and the line that flips build vs neocloud decisions.
Compare against neocloud and hyperscaler effective rates pulled from FOCUS EffectiveCost ([[focus-spec]]) — not list rates. Yobitel Omniscient Compute publishes normalised TCO rankings already; cross-reference your spreadsheet against the index.
Sensitivity analysis — tornado chart on utilisation, USD/kWh, and discount rate. Decisions that hold across all three sensitivities are robust; decisions that only hold in the optimistic corner are not.

Where TCO sits in the Yobitel stack

References

FinOps Foundation — Quantify Business Value capability · FinOps Foundation
Uptime Institute — Global Data Center Survey · Uptime Institute
AWS Pricing — EC2 P5 instances · AWS
Azure ND H100 v5 series · Microsoft Learn
NVIDIA HGX H100 / H200 / B200 reference design · NVIDIA

Total Cost of Ownership for AI Infrastructure

Overview

How TCO works — the six-bucket model

Power and cooling — the bucket most models underweight

Variants — when to apply each TCO horizon

Build vs hyperscaler vs neocloud vs hybrid — when each wins

Trade-offs and known limitations

Practical implementation notes — how to actually build the model

Where TCO sits in the Yobitel stack

References

Browse all entries

Deploy on Yobibyte

Total Cost of Ownership for AI Infrastructure

Overview

How TCO works — the six-bucket model

Power and cooling — the bucket most models underweight

Variants — when to apply each TCO horizon

Build vs hyperscaler vs neocloud vs hybrid — when each wins

Trade-offs and known limitations

Practical implementation notes — how to actually build the model

Where TCO sits in the Yobitel stack

References

Browse all entries

Deploy on Yobibyte