TL;DR
- Argo Workflows is a CNCF Graduated container-native workflow engine for Kubernetes, originally from Intuit's Applatix acquisition (2018) and graduated in 2022.
- Models workflows as DAGs (or step lists) where each node is a containerised pod with declared inputs, outputs, and artifact handling.
- The execution engine beneath Kubeflow Pipelines v1, used directly by data and ML teams at Adobe, BlackRock, Tesla, NVIDIA, and most large Kubernetes shops for batch orchestration.
- Sibling projects under the Argo umbrella: Argo CD (GitOps), Argo Events (event-driven triggers), Argo Rollouts (progressive delivery) — all CNCF Graduated.
What Argo Workflows Is#
Argo Workflows is a Kubernetes-native batch orchestrator. A Workflow CRD describes a DAG of tasks; each task is a containerised pod the engine creates, monitors, and tears down. Inputs and outputs flow between tasks as parameters or artifacts (typically S3-compatible blobs). Compared to Airflow, Argo is Kubernetes-first (every task is a pod, not a process on a worker) and YAML-native.
Workflow Anatomy#
- Workflow — top-level CRD describing the DAG, entrypoint, and templates.
- Template — reusable task definition (container, script, or DAG sub-graph).
- Step / DAG — two ways to compose templates: sequential steps or arbitrary DAG.
- Artifacts — files passed between tasks via S3, GCS, Azure Blob, or Git.
- Parameters — typed scalar inputs/outputs, including JSON.
- Workflow Templates / Cluster Workflow Templates — reusable, version-controlled workflow definitions.
- Cron Workflows — Cron-scheduled execution.
ML Pipeline Example#
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: rag-index-build-
spec:
entrypoint: pipeline
templates:
- name: pipeline
dag:
tasks:
- name: extract
template: extract-docs
- name: embed
template: embed-chunks
dependencies: [extract]
- name: index
template: build-index
dependencies: [embed]
- name: embed-chunks
container:
image: yobitel/embed-pipeline:0.4
resources:
limits: { nvidia.com/gpu: 1 }Argo Workflows vs Kubeflow Pipelines#
Kubeflow Pipelines v1 was built on Argo Workflows — KFP authored Python DSLs that compiled to Argo manifests. KFP v2 abstracted the backend (you can now run KFP on Argo or Tekton), but Argo is still the most widely deployed engine.
For ML-specific workflows that benefit from artifact lineage tracking, Pipelines and KServe integration, KFP on top of Argo is the right pick. For general batch orchestration — data engineering, infra automation, periodic jobs — Argo Workflows directly is simpler and avoids the KFP runtime overhead.
Argo Workflows vs Airflow#
| Property | Argo Workflows | Airflow |
|---|---|---|
| Native model | Kubernetes pods | Python processes |
| Task isolation | Container per task | Worker pool (or KubernetesPodOperator) |
| DAG definition | YAML or Python SDK | Python (DAGs) |
| Artifacts | S3/GCS/Git built-in | XCom (small) or external |
| Sweet spot | Containerised batch + ML | Data engineering + ETL |
| Operating model | Kubernetes-first | Database-backed scheduler |
GPU Workflows#
Each Argo template can declare GPU requests like any other pod (`nvidia.com/gpu: 1`). For multi-node training inside a workflow step, pair Argo with PyTorchJob/MPIJob templates that the step creates and awaits — or use the Resource template type to submit a PodGroup to Volcano. Mid-2026, the most common ML use of Argo is offline batch pipelines (dataset prep, batch inference, eval harnesses) rather than training itself — training tends to be a single CRD owned by Training Operator or KubeRay.
For deep-learning training jobs, prefer PyTorchJob/MPIJob/RayJob over wrapping training in an Argo template. Argo shines at the pipeline glue around training — dataset prep, eval, registration, deployment.
The Argo Family#
Argo Workflows is the original Argo project, but the umbrella now includes Argo CD (GitOps continuous delivery), Argo Events (event-driven workflow triggers), and Argo Rollouts (progressive delivery via canary/blue-green). All four are CNCF Graduated. They share governance but are independently installable — most clusters adopt Argo CD and Argo Workflows, with Events/Rollouts as needed.
References
- Argo Workflows Documentation · Argo Project
- argo-workflows on GitHub · GitHub
- CNCF Argo Project Page · CNCF