TL;DR
- SaaS (and self-hosted) experiment tracker founded 2017; logs metrics, gradients, system stats, code, and artefacts to a hosted dashboard.
- Integrates with PyTorch, Lightning, HuggingFace, Keras, scikit-learn — typically two lines of code per script.
- Beyond logging: Sweeps for distributed HPO, Artefacts for dataset/model versioning, Reports for shareable analysis, Weave for LLM trace observability.
Overview#
Weights & Biases (W&B) is the most widely used experiment-tracking platform in deep-learning practice. Its core proposition is the hosted dashboard — run a training script, get a live URL with per-step loss curves, gradient histograms, system metrics, and code snapshots. From there it expands into HPO, dataset versioning, and LLM observability.
The integration surface is intentionally minimal: `wandb.init()` at the start of a run and `wandb.log({...})` at each step. Framework integrations (Lightning, HuggingFace Trainer, Keras callbacks) wrap this automatically.
What W&B Provides#
- Experiments — live-streamed metrics, system stats, hyperparameter and config snapshots.
- Sweeps — distributed hyperparameter search using Bayesian, grid, or random samplers, coordinated through the W&B server.
- Artefacts — versioned datasets, model checkpoints, evaluation outputs, with dependency tracking.
- Reports — shareable, executable narrative documents combining markdown, charts, and tables.
- Tables — interactive structured data for evaluation results and per-example error analysis.
- Weave — newer observability layer for LLM traces, prompts, and chain-of-thought outputs.
Mechanism#
The W&B client buffers and streams metric updates to the W&B server (cloud or self-hosted). System metrics — GPU utilisation, memory, temperature, network — are collected by a separate background thread. Code is captured by hashing the git state; uncommitted diffs are saved as a patch.
For Sweeps, the W&B server runs the search algorithm and dispatches trial configs to agents (worker processes that pull configs and run the user's script). Agents can run anywhere — laptops, on-prem clusters, multiple clouds — coordinating through the W&B API.
For sovereign or air-gapped deployments, W&B offers a Dedicated Cloud and on-prem 'Local' deployment — useful for regulated industries where training metadata cannot leave the customer's network.
When to Use#
Default choice for experiment tracking in most ML teams as of 2026. Alternatives: MLflow if you want fully open source and self-hosted, ClearML if you want a more turnkey self-hosted experience, TensorBoard if you only need basic loss curves and want to stay zero-dependency. The W&B vs MLflow choice usually comes down to whether you prefer a polished SaaS UI or a self-hosted OSS stack.
Pitfalls#
- Default plans have per-team storage and metric volume limits — high-frequency logging can hit them.
- Pricing model has changed multiple times; check current tier before committing.
- Some teams find the SaaS dependency unacceptable for sensitive data — use Local / Dedicated Cloud or pick MLflow.
- Heavy use of artefacts and tables can balloon storage and slow loads.
Software#
- Python client (`pip install wandb`) — the canonical integration.
- Framework integrations for Lightning, HuggingFace, Keras, XGBoost, fastai, scikit-learn.
- W&B Sweeps for distributed HPO.
- W&B Local for self-hosted deployments.
- Weave for LLM observability.
References
- Weights & Biases documentation · Weights & Biases
- W&B client on GitHub · GitHub (W&B)
- Comparing W&B and MLflow · Weights & Biases