Managed Operations

TL;DR

Managed Operations is Yobitel's 24/7 NOC for AI infrastructure — covering Yobitel NeoCloud, customer-built AWS EKS / Azure AKS / GCP GKE clusters, on-premise Kubernetes, bare-metal GPU clusters, and hybrid estates that span more than one of the above.
Three SLA tiers — Standard, Premium, and Mission-critical — with RTO and RPO commitments scaling per tier, P1 response in under 15 minutes on Mission-critical, and named Technical Account Managers from Premium upward.
Onboarding is a structured engagement (discovery -> baseline -> watch -> incident response) with a typical four-to-eight-week ramp before the SLA goes live on the customer's estate.
Yobitel reads the customer's existing Prometheus and Grafana, integrates incident webhooks into the customer's PagerDuty or Opsgenie, and follows ITIL-aligned change management. The customer keeps ownership of every system; Yobitel runs the operations rota.
Pricing is per-node or per-cluster per month in USD; the FOCUS 1.1 billing export carries the same column shape as the rest of the Yobitel stack so Managed Operations spend lands in the customer's FinOps pipeline alongside compute.

Overview

Production AI infrastructure rarely fails dramatically. It drifts. Certificates expire on a Sunday. A GPU node's NVLink throughput collapses to half. A Kubernetes upgrade silently breaks the DCGM exporter. A billing job hangs and a workspace overruns its budget by 4x in the time it takes anyone to notice. None of these failures are interesting individually; each of them is the kind of thing that a 24/7 operations rota with the right runbook catches in minutes and that an under-staffed in-house team catches in days. Managed Operations is Yobitel's service for owning that rota on the customer's behalf, against the customer's own estate.

The service is genuinely cross-estate. A customer might run Yobibyte on NeoCloud for production inference, EKS in eu-west-2 for application services, an on-premise Kubernetes cluster for regulated training, and a bare-metal GPU pod for research. Managed Operations covers all four under one contract, one incident webhook, one monthly review, and one named TAM. The watch is structured around incident severity (P1 through P4), SLA tier, and a runbook library that grows with the customer.

Yobitel runs the rota; the customer keeps every system. The customer's Kubernetes clusters, cloud accounts, on-premise hardware, and data remain customer-owned and customer-controlled. Yobitel reads the customer's existing Prometheus, Grafana, OpenTelemetry, and log-aggregation surfaces; integrates incident webhooks into the customer's PagerDuty, Opsgenie, or ServiceNow; and follows the customer's change-approval workflow alongside Yobitel's ITIL-aligned process. There is no Yobitel-installed agent that has read-write access the customer cannot revoke.

Yobitel Communications, the UK-headquartered AI infrastructure company that delivers Managed Operations, sells the service as a per-node or per-cluster monthly subscription in USD. UK NCSC OFFICIAL alignment is the default posture for the operations team; ISO 27001 and SOC 2 Type II cover the wider control set. Premium and Mission-critical tiers include a named TAM as the single point of contact for both operational matters and capacity planning.

Quick start — onboarding a customer estate

Onboarding follows a structured four-stage flow that takes most customers four to eight weeks from contract sign to SLA going live on the watched estate. The stages run in sequence and the customer's TAM and operations lead are the joint owners; the customer's existing platform and SRE teams stay in the loop throughout.

Stage one — discovery. The TAM and an operations engineer walk through the customer's estate: which clusters, which clouds, which on-premise hardware, which observability stack, which incident-management tooling, which change-approval workflow, which compliance pin. The output is an estate map and a baseline runbook list pulled from Yobitel's existing library, annotated with what the customer already has covered and what the watch will need.

Stage two — baseline. Yobitel reads the customer's Prometheus and Grafana, integrates the incident webhook into the customer's PagerDuty or Opsgenie, registers a Yobitel pager rotation, and walks the joint team through the standard P1-P4 severity definitions. The baseline stage produces the first month of run-of-rota data — actual on-call load, the noisiest alerts, the most-fired runbooks — without yet pulling the trigger on the SLA.

Stage three — watch. The SLA goes live. Yobitel's NOC takes the on-call rotation; the customer's team stays available for warm hand-off during the first month of live watch. Monthly service reviews start at this stage and continue for the life of the engagement.

Stage four — incident response. The first real incident under SLA is handled jointly; the post-mortem is co-owned. By the end of the second month under SLA, the customer's team is typically able to step out of warm hand-off and rely on the NOC for the watched envelope.

Tip: Onboarding lands faster when the customer's observability surface is already centralised — even if the dashboards are imperfect. Yobitel can extend an existing Grafana setup in days; rebuilding observability from scratch adds weeks to the baseline stage.

Concepts

Managed Operations exposes a small set of concepts that match how an operations leader thinks about a 24/7 watch. The mental model is incident severity at the centre, with watch tier, runbook library, and on-call schedule around it.

Watch tier — the SLA tier the engagement is signed at. Standard (business-hours watch with on-call escalation), Premium (24/7 watch with named TAM), or Mission-critical (24/7 watch with redundant on-call rotation, sub-15-minute P1 response, and quarterly tabletop exercises).
Incident severity — the P1-P4 classification applied at the moment an alert fires. P1 is customer-impacting and time-critical (e.g. production inference down, billing pipeline halted). P2 is degraded but contained (e.g. one node out of a cluster). P3 is non-customer-impacting (e.g. certificate expiry warning). P4 is informational.
Runbook — the documented response to a specific alert or incident class. Yobitel maintains a runbook library that the customer's specific runbooks extend; runbooks are versioned in the customer's Git and Yobitel's reviewed and approved through the standard change-approval workflow.
On-call schedule — the rota and escalation path. Yobitel's NOC primary, customer's secondary (optional), customer's escalation owner (required), TAM (Premium and Mission-critical only). The schedule is published in the customer's incident-management tool.
Capacity planning cycle — the monthly review that surfaces utilisation, drift, and capacity recommendations. Drives reservation, scale-out, and right-sizing decisions; for Yobibyte and NeoCloud customers, the cycle integrates with the same FOCUS export the customer sees.
Scope envelope — the explicit list of clusters, clouds, on-premise hardware, and software surfaces under SLA. Anything outside the envelope is best-effort or excluded. The envelope changes through the standard change-management process, not in-flight.

SLA tiers and commitments

Three SLA tiers cover the bulk of customer needs. The tier defines the watch window, the response commitments, the recovery commitments, and the included surfaces; the customer's actual scope envelope sits inside the tier.

Tier	Watch window	P1 response	P2 response	RTO target	RPO target	Included surfaces
Standard	Business hours plus on-call escalation	< 60 minutes	< 4 hours	4 hours	1 hour	Up to 50 nodes or 4 clusters; single-region; primary observability stack.
Premium	24/7	< 30 minutes	< 2 hours	1 hour	15 minutes	Up to 500 nodes or 20 clusters; multi-region; named TAM; quarterly business review.
Mission-critical	24/7 with redundant rotation	< 15 minutes	< 1 hour	15 minutes	5 minutes	Up to 5,000 nodes or 100 clusters; multi-region with DR; named TAM; quarterly tabletop exercises; executive escalation path.

Supported infrastructure

Managed Operations covers a broad envelope of customer-built and Yobitel-built infrastructure. The supported list below is the envelope under SLA today; anything outside the envelope is handled as Professional Services rather than ongoing operations.

Category	Supported	Notes
Yobitel-managed	Yobibyte workspaces, NeoCloud reservations, AI Applications (MediQuery and the wider suite)	Bundled at a reduced rate when Yobibyte or NeoCloud is the primary contract.
Hyperscaler Kubernetes	AWS EKS, Azure AKS, GCP GKE, Oracle OKE	GPU node pools supported across all four hyperscalers.
On-premise Kubernetes	Vanilla Kubernetes, Rancher RKE2, Red Hat OpenShift, SUSE Rancher	Includes air-gapped clusters with no internet egress.
Bare-metal GPU	NVIDIA DGX SuperPOD, HGX systems, AMD MI300X reference designs	Includes facility coordination via NeoCloud Operations on partner-build estates.
Observability	Prometheus, Grafana, OpenTelemetry, Loki, Tempo, customer-managed Mimir or Cortex	Yobitel reads the customer's existing stack rather than installing a parallel one.
Incident management	PagerDuty, Opsgenie, ServiceNow, Atlassian Jira Service Management	Webhook integration; no read-write agent installed.
Change approval	ITIL-aligned process integrating with ServiceNow, Jira, or customer-built tooling	Yobitel's change workflow runs alongside the customer's.
Identity	OIDC (Okta, Microsoft Entra ID, Auth0, Keycloak, Google Workspace) + SCIM 2.0	RBAC for the customer's runbook access and the TAM's read scope.

Engagement size by infrastructure scale

The sizing below comes from production engagements at the small, mid-market, and enterprise scale. Node counts are aggregate across the watched estate; the relevant tier is driven primarily by criticality, not node count.

Estate scale	Watched nodes / clusters	Recommended tier	Yobitel engagement size	Indicative price band
Small (single product)	10 - 50 nodes / 1 - 4 clusters	Standard	Shared NOC pool; on-call escalation	$5K - $15K / month
Mid-market (multi-product)	50 - 250 nodes / 4 - 12 clusters	Premium	Dedicated TAM; quarterly business review	$25K - $80K / month
Enterprise (platform)	250 - 1,000 nodes / 12 - 50 clusters	Premium or Mission-critical	Dedicated TAM; redundant on-call rotation	$80K - $250K / month
Mission-critical (regulated)	1,000 - 5,000 nodes / 50 - 100 clusters	Mission-critical	Dedicated TAM; quarterly tabletop; executive escalation	$250K - $1M+ / month
Multi-tenant operator	5,000+ nodes / 100+ clusters	Mission-critical with custom envelope	Multiple TAMs by region; integrated NOC	Custom

Scope envelope — in and out

Managed Operations is a service contract, so the relevant question is not 'what is the limit?' but 'what is in scope?'. The table below is the standard envelope; customer-specific extensions and exclusions are documented in the signed scope statement and reviewed quarterly.

Area	In scope (default)	Out of scope (default)	Notes
Infrastructure availability	Cluster, node, and pod-level availability	Application-level availability of customer code	Customer code is the customer's; Yobitel monitors the platform it runs on.
Incident response	Infrastructure incidents at P1-P4	Customer-code incidents	Yobitel can extend scope to application code under Premium with custom runbooks.
Patching	Cluster, OS, and platform component patching on change-approval cadence	Customer application image patching	Customer-application patching covered under Professional Services.
Security	Continuous vulnerability scanning, patch coordination, identity drift detection	Application-layer pen testing	Pen testing covered under Professional Services.
Capacity planning	Monthly capacity review with reservation and right-sizing recommendations	Procurement execution	Yobitel makes recommendations; customer executes procurement (often via Omniscient Compute).
Cost optimisation	FOCUS-export analysis with quarterly findings	Implementation of cost-optimisation changes	Changes implemented through standard change-management.
Disaster recovery	DR runbook and testing on Premium and Mission-critical; quarterly exercises on Mission-critical	DR strategy design	Strategy design covered under Professional Services.
Compliance audit	Evidence collection for SOC 2, ISO 27001, NCSC, HIPAA	Audit certification itself	Yobitel provides evidence; the customer's auditor certifies.

Pricing

Managed Operations is priced per-node or per-cluster per month in USD on a tiered base rate. The base rate covers the SLA tier; per-incident surcharges apply only to engagements outside the standard scope envelope. Pricing is delivered as a FOCUS 1.1 line item alongside the rest of the Yobitel stack so spend rolls up into the customer's FinOps pipeline without manual reconciliation.

Tier	Per-node $/month	Per-cluster $/month base	Included incidents per quarter	Notes
Standard	$45 - $80	$1,500 - $3,000	Up to 8 P1/P2	Single-region, business-hours watch.
Premium	$95 - $150	$5,000 - $10,000	Up to 24 P1/P2	24/7 watch, named TAM, quarterly review.
Mission-critical	$180 - $320	$15,000 - $35,000	Unlimited	24/7 redundant watch, sub-15-minute P1, tabletop exercises.
Yobibyte/NeoCloud bundled discount	20 - 30% off the per-node rate	—	Per tier	Applied when Yobibyte or NeoCloud is the primary contract.
On-premise bare-metal surcharge	$10 - $30 per node	—	Per tier	Covers facility coordination and on-site hand-off.

Note: The per-node rate scales down with watched fleet size; large enterprise contracts (1,000+ nodes) typically settle in the lower band per node, while small contracts (under 50 nodes) settle higher per node to cover fixed TAM and rota overhead.

Security and compliance

Managed Operations is delivered by a UK-headquartered team operating under NCSC Cloud Security Principles. UK NCSC OFFICIAL alignment is the default posture for the watch; EU and US engagements layer GDPR, EU AI Act high-risk-system obligations (where the watched estate is in scope), HIPAA, and SOC 2 Type II on top. The operations team's own controls (background checks, BYOD restrictions, separation of duties, immutable audit logging) sit under ISO 27001 and SOC 2 Type II.

Access to the customer's estate is read-only by default. Any write access required for incident response is requested through the customer's change-approval workflow with the relevant runbook attached; the customer can pre-approve named runbooks for in-incident write access, or require live approval for every write. Every action Yobitel takes on the customer's estate is logged to the customer's audit surface, not a Yobitel-only one.

NCSC Cloud Security Principles — default posture for the operations team and the customer-watched envelope.
G-Cloud — listed under Cloud Support (Lot 3); orderable through the Crown Commercial Service framework.
Cyber Essentials Plus — current certificate for the operations team.
ISO 27001:2022 — current certificate covering the operations team and its tooling.
SOC 2 Type II — annual third-party audit covering security, availability, confidentiality.
ITIL-aligned change management — integrating with the customer's existing approval workflow.
GDPR / UK DPA 2018 — DPA, sub-processor list, EU SCCs available.
EU AI Act — for customers running high-risk AI systems, Managed Operations provides the operational-resilience evidence layer.
HIPAA — BAA available for healthcare-customer engagements.
Read-only-by-default — write access requested through the customer's change-approval workflow; pre-approved runbooks available.

Alternatives

Managed Operations is one option for running a 24/7 watch on AI infrastructure. The honest read: an in-house SRE team gives full control but takes 6-18 months to staff and burns continuously on rota cost; a hyperscaler-managed service covers the cloud-native primitives well but loses depth on GPU, fabric, inference engines, and ML pipelines, and is single-cloud by definition; a general MSP can cover availability but rarely has AI-infrastructure depth or AI-specific runbook libraries. Managed Operations sits in the middle as the contract that covers AI-infrastructure breadth across cloud, on-premise, and hybrid estates, with UK NCSC OFFICIAL as the default sovereignty posture and Yobibyte/NeoCloud bundled-discount economics where the customer already runs on Yobitel surfaces.

Concern	Yobitel Managed Operations	In-house SRE	AWS / Azure / GCP managed services	General MSP
AI infrastructure depth	GPU, fabric, inference engines, ML pipelines covered natively	Whatever you hire	Generic cloud-service depth	Limited
Cross-estate (cloud + on-prem + hybrid)	Yes	Yes if you build it	Single-cloud focus	Yes
Sovereignty posture	UK NCSC OFFICIAL default, EU and US tiers	Whatever you build	Cloud's posture	Variable
Integration with customer Prometheus + PagerDuty	Yes, read-only by default	Customer-owned	Cloud-native tools	Variable
FOCUS-aligned billing export	Yes	DIY	Cloud-native billing	Limited
Yobibyte / NeoCloud bundled discount	Yes	N/A	N/A	N/A
P1 response SLA	Under 15 minutes (Mission-critical)	Whatever you staff	Variable	Variable
Named TAM from Premium tier	Yes	N/A	Enterprise support only	Variable
Read-only-by-default + runbook-approved write	Yes	Customer-owned	Cloud-managed	Variable
Knowledge transfer at exit	Documented runbook library handed back	N/A	Limited	Variable

Common incident classes

Managed Operations is a service contract rather than a product, so 'troubleshooting' is reframed as the most common incident classes the watch handles. The classes below cover the bulk of paged incidents on a typical customer estate; the runbook library covers them with documented response, fix, and post-mortem templates.

Incident class	Typical cause	Runbook response
Capacity exhaustion	Workload growth exceeded reservation; on-demand and spot exhausted in the region.	Engage customer's TAM for emergency reservation expansion; route burst traffic to sibling region if sovereignty allows; page customer's escalation owner on the third occurrence in a quarter.
Drift	Cluster configuration has drifted from the documented baseline (e.g. node pool resized, CNI plugin upgraded outside change management).	Open a P3 ticket against the customer's change-approval workflow; restore baseline through the standard change process; update runbook if drift is recurring.
Certificate expiry	TLS certificate, OIDC signing key, or Kubernetes control-plane cert approaching expiry without rotation.	P2 ticket 14 days before expiry, P1 in the 24 hours before. Yobitel coordinates rotation through the standard change workflow.
Billing overrun	Workspace, reservation, or workload exceeds the configured USD spend cap or trends to exceed it.	P2 page to customer's billing owner with FOCUS export analysis attached; suggest mitigation (reservation, right-sizing, throttle).
Fabric degradation	InfiniBand or RoCEv2 link error rate elevated on a training cluster.	P1 page; coordinate with NeoCloud NOC if the fabric is Yobitel-operated; pause distributed training to avoid NCCL collective corruption.
Identity federation drift	OIDC IdP rotated keys or changed audience; workspace cannot validate tokens.	P1 page; coordinate with customer's identity team for re-federation; runbook covers Okta, Entra ID, Auth0, Keycloak.
Inference cold-start cascade	Scale-to-zero endpoints saw correlated traffic surge; cold-start time exceeded SLO.	P2 page; warm replicas raised; runbook updated if cascade pattern is recurring.
Patch gap	Critical CVE published for a component on the watched estate.	P1 or P2 depending on CVSS; coordinate patch through customer's change-approval workflow.
Quota tripped during incident	Hyperscaler API quota or NeoCloud reservation quota tripped while scaling out under load.	P1 page; engage customer's TAM and the hyperscaler for emergency quota increase; runbook covers AWS, Azure, GCP, NeoCloud.
Audit-trail gap	Audit export pipeline halted (bucket policy drift, KMS key rotation).	P3 ticket; resolve through change workflow; post-mortem covers detection-to-resolution gap.

Where Managed Operations fits in the Yobitel stack

Managed Operations is the day-two operations layer that wraps the rest of the Yobitel stack. Professional Services delivers the day-one build; NeoCloud Operations delivers the partner-build sovereign facility; Yobibyte is the managed inference surface; NeoCloud is the sovereign capacity layer; Customer Excellency owns the strategic relationship. Managed Operations is the contract that keeps the result running.

Most customers buy Managed Operations alongside one or more of the other Yobitel surfaces. A customer running Yobibyte on NeoCloud often adds Managed Operations to cover the customer's own application services and any non-Yobitel infrastructure in the same envelope. A customer running a partner-built NeoCloud often contracts Managed Operations as the day-two operate phase. A customer running entirely on hyperscaler Kubernetes can contract Managed Operations without ever adopting another part of the Yobitel stack — the watch covers their estate as-is.

The boundary with Professional Services is deliberate. Professional Services is the engineering engagement for net-new builds, migrations, and bespoke implementations; Managed Operations is the ongoing watch on what is in production. Most engagements that start as Professional Services convert at least partially into Managed Operations at production hand-off; the same engineers in the consulting engagement are not the same engineers running the rota, but the runbook library carries through from one to the other.

References

Managed Operations service page · Yobitel
Yobitel NeoCloud · Yobitel
Yobibyte platform · Yobitel
Professional Services · Yobitel
NCSC Cloud Security Principles · NCSC
ITIL framework · AXELOS

TL;DR

Managed Operations is Yobitel's 24/7 NOC for AI infrastructure — covering Yobitel NeoCloud, customer-built AWS EKS / Azure AKS / GCP GKE clusters, on-premise Kubernetes, bare-metal GPU clusters, and hybrid estates that span more than one of the above.
Three SLA tiers — Standard, Premium, and Mission-critical — with RTO and RPO commitments scaling per tier, P1 response in under 15 minutes on Mission-critical, and named Technical Account Managers from Premium upward.
Onboarding is a structured engagement (discovery -> baseline -> watch -> incident response) with a typical four-to-eight-week ramp before the SLA goes live on the customer's estate.
Yobitel reads the customer's existing Prometheus and Grafana, integrates incident webhooks into the customer's PagerDuty or Opsgenie, and follows ITIL-aligned change management. The customer keeps ownership of every system; Yobitel runs the operations rota.
Pricing is per-node or per-cluster per month in USD; the FOCUS 1.1 billing export carries the same column shape as the rest of the Yobitel stack so Managed Operations spend lands in the customer's FinOps pipeline alongside compute.

Overview

Quick start — onboarding a customer estate

Tip: Onboarding lands faster when the customer's observability surface is already centralised — even if the dashboards are imperfect. Yobitel can extend an existing Grafana setup in days; rebuilding observability from scratch adds weeks to the baseline stage.

Concepts

Watch tier — the SLA tier the engagement is signed at. Standard (business-hours watch with on-call escalation), Premium (24/7 watch with named TAM), or Mission-critical (24/7 watch with redundant on-call rotation, sub-15-minute P1 response, and quarterly tabletop exercises).
Incident severity — the P1-P4 classification applied at the moment an alert fires. P1 is customer-impacting and time-critical (e.g. production inference down, billing pipeline halted). P2 is degraded but contained (e.g. one node out of a cluster). P3 is non-customer-impacting (e.g. certificate expiry warning). P4 is informational.
Runbook — the documented response to a specific alert or incident class. Yobitel maintains a runbook library that the customer's specific runbooks extend; runbooks are versioned in the customer's Git and Yobitel's reviewed and approved through the standard change-approval workflow.
On-call schedule — the rota and escalation path. Yobitel's NOC primary, customer's secondary (optional), customer's escalation owner (required), TAM (Premium and Mission-critical only). The schedule is published in the customer's incident-management tool.
Capacity planning cycle — the monthly review that surfaces utilisation, drift, and capacity recommendations. Drives reservation, scale-out, and right-sizing decisions; for Yobibyte and NeoCloud customers, the cycle integrates with the same FOCUS export the customer sees.
Scope envelope — the explicit list of clusters, clouds, on-premise hardware, and software surfaces under SLA. Anything outside the envelope is best-effort or excluded. The envelope changes through the standard change-management process, not in-flight.

SLA tiers and commitments

Tier	Watch window	P1 response	P2 response	RTO target	RPO target	Included surfaces
Standard	Business hours plus on-call escalation	< 60 minutes	< 4 hours	4 hours	1 hour	Up to 50 nodes or 4 clusters; single-region; primary observability stack.
Premium	24/7	< 30 minutes	< 2 hours	1 hour	15 minutes	Up to 500 nodes or 20 clusters; multi-region; named TAM; quarterly business review.
Mission-critical	24/7 with redundant rotation	< 15 minutes	< 1 hour	15 minutes	5 minutes	Up to 5,000 nodes or 100 clusters; multi-region with DR; named TAM; quarterly tabletop exercises; executive escalation path.

Supported infrastructure

Category	Supported	Notes
Yobitel-managed	Yobibyte workspaces, NeoCloud reservations, AI Applications (MediQuery and the wider suite)	Bundled at a reduced rate when Yobibyte or NeoCloud is the primary contract.
Hyperscaler Kubernetes	AWS EKS, Azure AKS, GCP GKE, Oracle OKE	GPU node pools supported across all four hyperscalers.
On-premise Kubernetes	Vanilla Kubernetes, Rancher RKE2, Red Hat OpenShift, SUSE Rancher	Includes air-gapped clusters with no internet egress.
Bare-metal GPU	NVIDIA DGX SuperPOD, HGX systems, AMD MI300X reference designs	Includes facility coordination via NeoCloud Operations on partner-build estates.
Observability	Prometheus, Grafana, OpenTelemetry, Loki, Tempo, customer-managed Mimir or Cortex	Yobitel reads the customer's existing stack rather than installing a parallel one.
Incident management	PagerDuty, Opsgenie, ServiceNow, Atlassian Jira Service Management	Webhook integration; no read-write agent installed.
Change approval	ITIL-aligned process integrating with ServiceNow, Jira, or customer-built tooling	Yobitel's change workflow runs alongside the customer's.
Identity	OIDC (Okta, Microsoft Entra ID, Auth0, Keycloak, Google Workspace) + SCIM 2.0	RBAC for the customer's runbook access and the TAM's read scope.

Engagement size by infrastructure scale

Estate scale	Watched nodes / clusters	Recommended tier	Yobitel engagement size	Indicative price band
Small (single product)	10 - 50 nodes / 1 - 4 clusters	Standard	Shared NOC pool; on-call escalation	$5K - $15K / month
Mid-market (multi-product)	50 - 250 nodes / 4 - 12 clusters	Premium	Dedicated TAM; quarterly business review	$25K - $80K / month
Enterprise (platform)	250 - 1,000 nodes / 12 - 50 clusters	Premium or Mission-critical	Dedicated TAM; redundant on-call rotation	$80K - $250K / month
Mission-critical (regulated)	1,000 - 5,000 nodes / 50 - 100 clusters	Mission-critical	Dedicated TAM; quarterly tabletop; executive escalation	$250K - $1M+ / month
Multi-tenant operator	5,000+ nodes / 100+ clusters	Mission-critical with custom envelope	Multiple TAMs by region; integrated NOC	Custom

Scope envelope — in and out

Area	In scope (default)	Out of scope (default)	Notes
Infrastructure availability	Cluster, node, and pod-level availability	Application-level availability of customer code	Customer code is the customer's; Yobitel monitors the platform it runs on.
Incident response	Infrastructure incidents at P1-P4	Customer-code incidents	Yobitel can extend scope to application code under Premium with custom runbooks.
Patching	Cluster, OS, and platform component patching on change-approval cadence	Customer application image patching	Customer-application patching covered under Professional Services.
Security	Continuous vulnerability scanning, patch coordination, identity drift detection	Application-layer pen testing	Pen testing covered under Professional Services.
Capacity planning	Monthly capacity review with reservation and right-sizing recommendations	Procurement execution	Yobitel makes recommendations; customer executes procurement (often via Omniscient Compute).
Cost optimisation	FOCUS-export analysis with quarterly findings	Implementation of cost-optimisation changes	Changes implemented through standard change-management.
Disaster recovery	DR runbook and testing on Premium and Mission-critical; quarterly exercises on Mission-critical	DR strategy design	Strategy design covered under Professional Services.
Compliance audit	Evidence collection for SOC 2, ISO 27001, NCSC, HIPAA	Audit certification itself	Yobitel provides evidence; the customer's auditor certifies.

Pricing

Tier	Per-node $/month	Per-cluster $/month base	Included incidents per quarter	Notes
Standard	$45 - $80	$1,500 - $3,000	Up to 8 P1/P2	Single-region, business-hours watch.
Premium	$95 - $150	$5,000 - $10,000	Up to 24 P1/P2	24/7 watch, named TAM, quarterly review.
Mission-critical	$180 - $320	$15,000 - $35,000	Unlimited	24/7 redundant watch, sub-15-minute P1, tabletop exercises.
Yobibyte/NeoCloud bundled discount	20 - 30% off the per-node rate	—	Per tier	Applied when Yobibyte or NeoCloud is the primary contract.
On-premise bare-metal surcharge	$10 - $30 per node	—	Per tier	Covers facility coordination and on-site hand-off.

Note: The per-node rate scales down with watched fleet size; large enterprise contracts (1,000+ nodes) typically settle in the lower band per node, while small contracts (under 50 nodes) settle higher per node to cover fixed TAM and rota overhead.

Security and compliance

NCSC Cloud Security Principles — default posture for the operations team and the customer-watched envelope.
G-Cloud — listed under Cloud Support (Lot 3); orderable through the Crown Commercial Service framework.
Cyber Essentials Plus — current certificate for the operations team.
ISO 27001:2022 — current certificate covering the operations team and its tooling.
SOC 2 Type II — annual third-party audit covering security, availability, confidentiality.
ITIL-aligned change management — integrating with the customer's existing approval workflow.
GDPR / UK DPA 2018 — DPA, sub-processor list, EU SCCs available.
EU AI Act — for customers running high-risk AI systems, Managed Operations provides the operational-resilience evidence layer.
HIPAA — BAA available for healthcare-customer engagements.
Read-only-by-default — write access requested through the customer's change-approval workflow; pre-approved runbooks available.

Alternatives

Concern	Yobitel Managed Operations	In-house SRE	AWS / Azure / GCP managed services	General MSP
AI infrastructure depth	GPU, fabric, inference engines, ML pipelines covered natively	Whatever you hire	Generic cloud-service depth	Limited
Cross-estate (cloud + on-prem + hybrid)	Yes	Yes if you build it	Single-cloud focus	Yes
Sovereignty posture	UK NCSC OFFICIAL default, EU and US tiers	Whatever you build	Cloud's posture	Variable
Integration with customer Prometheus + PagerDuty	Yes, read-only by default	Customer-owned	Cloud-native tools	Variable
FOCUS-aligned billing export	Yes	DIY	Cloud-native billing	Limited
Yobibyte / NeoCloud bundled discount	Yes	N/A	N/A	N/A
P1 response SLA	Under 15 minutes (Mission-critical)	Whatever you staff	Variable	Variable
Named TAM from Premium tier	Yes	N/A	Enterprise support only	Variable
Read-only-by-default + runbook-approved write	Yes	Customer-owned	Cloud-managed	Variable
Knowledge transfer at exit	Documented runbook library handed back	N/A	Limited	Variable

Common incident classes

Incident class	Typical cause	Runbook response
Capacity exhaustion	Workload growth exceeded reservation; on-demand and spot exhausted in the region.	Engage customer's TAM for emergency reservation expansion; route burst traffic to sibling region if sovereignty allows; page customer's escalation owner on the third occurrence in a quarter.
Drift	Cluster configuration has drifted from the documented baseline (e.g. node pool resized, CNI plugin upgraded outside change management).	Open a P3 ticket against the customer's change-approval workflow; restore baseline through the standard change process; update runbook if drift is recurring.
Certificate expiry	TLS certificate, OIDC signing key, or Kubernetes control-plane cert approaching expiry without rotation.	P2 ticket 14 days before expiry, P1 in the 24 hours before. Yobitel coordinates rotation through the standard change workflow.
Billing overrun	Workspace, reservation, or workload exceeds the configured USD spend cap or trends to exceed it.	P2 page to customer's billing owner with FOCUS export analysis attached; suggest mitigation (reservation, right-sizing, throttle).
Fabric degradation	InfiniBand or RoCEv2 link error rate elevated on a training cluster.	P1 page; coordinate with NeoCloud NOC if the fabric is Yobitel-operated; pause distributed training to avoid NCCL collective corruption.
Identity federation drift	OIDC IdP rotated keys or changed audience; workspace cannot validate tokens.	P1 page; coordinate with customer's identity team for re-federation; runbook covers Okta, Entra ID, Auth0, Keycloak.
Inference cold-start cascade	Scale-to-zero endpoints saw correlated traffic surge; cold-start time exceeded SLO.	P2 page; warm replicas raised; runbook updated if cascade pattern is recurring.
Patch gap	Critical CVE published for a component on the watched estate.	P1 or P2 depending on CVSS; coordinate patch through customer's change-approval workflow.
Quota tripped during incident	Hyperscaler API quota or NeoCloud reservation quota tripped while scaling out under load.	P1 page; engage customer's TAM and the hyperscaler for emergency quota increase; runbook covers AWS, Azure, GCP, NeoCloud.
Audit-trail gap	Audit export pipeline halted (bucket policy drift, KMS key rotation).	P3 ticket; resolve through change workflow; post-mortem covers detection-to-resolution gap.

Where Managed Operations fits in the Yobitel stack

References

Managed Operations service page · Yobitel
Yobitel NeoCloud · Yobitel
Yobibyte platform · Yobitel
Professional Services · Yobitel
NCSC Cloud Security Principles · NCSC
ITIL framework · AXELOS

Managed Operations

Overview

Quick start — onboarding a customer estate

Concepts

SLA tiers and commitments

Supported infrastructure

Engagement size by infrastructure scale

Scope envelope — in and out

Pricing

Security and compliance

Alternatives

Common incident classes

Where Managed Operations fits in the Yobitel stack

References

Browse all entries

Deploy on Yobibyte

Managed Operations

Overview

Quick start — onboarding a customer estate

Concepts

SLA tiers and commitments

Supported infrastructure

Engagement size by infrastructure scale

Scope envelope — in and out

Pricing

Security and compliance

Alternatives

Common incident classes

Where Managed Operations fits in the Yobitel stack

References

Browse all entries

Deploy on Yobibyte