Run:ai

TL;DR

Run:ai is a Kubernetes-native GPU orchestration platform founded in Tel Aviv in 2018; acquired by NVIDIA for a reported $700M in April 2024 and now part of the NVIDIA AI Enterprise stack.
Adds project-level quotas, fair-share queueing, fractional GPU sharing, on-demand workspaces, and a polished UI on top of vanilla Kubernetes.
Supports both MPS-style fractional GPU and MIG, plus its own dynamic time-slicing — useful for research/dev fleets that need fine-grained sharing on mixed hardware.
Commercial product; NVIDIA has signalled intent to contribute Run:ai components upstream but at time of writing the platform remains primarily proprietary.

Background and Acquisition#

Run:ai was founded in 2018 by Omri Geller and Ronen Dar with the thesis that GPU clusters needed the same kind of orchestration layer that Kubernetes had brought to general compute. The product gained traction in enterprise research teams who wanted multi-tenant GPU sharing without writing their own scheduling logic on top of Kubernetes.

NVIDIA announced the acquisition in April 2024 for a reported $700 million. The stated rationale was to make Run:ai part of NVIDIA AI Enterprise and to open-source key components — though at time of writing in mid-2026 the bulk of the platform is still proprietary.

Architecture#

Run:ai installs into a Kubernetes cluster and adds a control plane (the Run:ai backend) plus a custom scheduler that replaces the default kube-scheduler in Run:ai-managed namespaces. Projects map to Kubernetes namespaces; users belong to projects; workloads (training jobs, notebooks, inference services) inherit the project's quotas.

Project quotas — guaranteed and over-quota GPU allocations per team.
Fractional GPUs — request `0.5` GPU and Run:ai places multiple pods on one physical GPU using MPS or time-slicing.
Notebook workspaces — one-click Jupyter, VS Code, or PyCharm on GPU-backed pods.
Job dashboards — wall-clock, queue time, GPU utilisation per project for chargeback.
Hyperparameter search — sweep orchestration with quota-aware fan-out.

Fractional GPUs#

Run:ai's headline feature is fractional GPU support that works on hardware without MIG. A user requests `gpu: 0.25` and Run:ai places four such pods on one physical GPU, isolating memory via CUDA_VISIBLE_DEVICES manipulation and time-slicing compute. This is useful on L4, L40, A10, and consumer hardware where MIG is not an option.

Crucially, fractional placement is software isolation — the same caveats as MPS apply. Run:ai positions it for dev/research workloads, not multi-tenant production.

Fractional GPU on Run:ai is software-isolated. For production multi-tenant serving, prefer MIG on A100/H100/H200/B200 where the isolation is enforced in silicon.

Run:ai vs Open-Source Alternatives#

Run:ai sits in roughly the same space as Volcano + Kueue + Kubeflow + JupyterHub stitched together. Its value proposition is integration — one UI, one quota model, one install. The cost is vendor lock-in and a commercial licence priced per GPU.

For teams that want a turnkey GPU platform and value time-to-productive over openness, Run:ai is the established choice. For teams building sovereign or air-gapped infrastructure, or those wanting full source control over the platform, the open-source stack (Volcano + Kueue + KubeRay + KServe + Kubeflow) is the alternative.

Future Direction#

Since the acquisition, NVIDIA has integrated Run:ai with the NVIDIA AI Enterprise control plane, Base Command, and DGX Cloud. Pieces of the scheduler have been signalled for upstream contribution but no concrete CNCF donation has materialised at time of writing.

For platform teams making a multi-year bet, the considerations are: vendor strategy alignment (full NVIDIA stack vs heterogeneous), licensing cost trajectory under NVIDIA, and how aggressively the open-source stack closes feature parity.

References

Run:ai Documentation · Run:ai
NVIDIA Acquires Run:ai (press release) · NVIDIA Newsroom
Run:ai on GitHub (open components) · GitHub

Background and Acquisition#

Architecture#

Project quotas — guaranteed and over-quota GPU allocations per team.

Fractional GPUs — request `0.5` GPU and Run:ai places multiple pods on one physical GPU using MPS or time-slicing.

Notebook workspaces — one-click Jupyter, VS Code, or PyCharm on GPU-backed pods.

Job dashboards — wall-clock, queue time, GPU utilisation per project for chargeback.

Hyperparameter search — sweep orchestration with quota-aware fan-out.

Fractional GPUs#

Crucially, fractional placement is software isolation — the same caveats as MPS apply. Run:ai positions it for dev/research workloads, not multi-tenant production.

Fractional GPU on Run:ai is software-isolated. For production multi-tenant serving, prefer MIG on A100/H100/H200/B200 where the isolation is enforced in silicon.

Run:ai vs Open-Source Alternatives#

Future Direction#

Run:ai

Background and Acquisition#

Architecture#

Fractional GPUs#

Run:ai vs Open-Source Alternatives#

Future Direction#

References

Browse all entries

Deploy on Yobitel

Run:ai

Background and Acquisition#

Architecture#

Fractional GPUs#

Run:ai vs Open-Source Alternatives#

Future Direction#

References

Browse all entries

Deploy on Yobitel