TL;DR
- Hyperparameter-tuning library inside Ray (anyscale.com / github.com/ray-project/ray), Apache 2.0.
- Differentiator vs Optuna: scales naturally across the Ray cluster — Tune handles trial scheduling, resource allocation, and fault tolerance for hundreds of concurrent trials.
- Bundles many search algorithms (Bayesian, BOHB, PBT, ASHA) and pluggable search-algorithm wrappers around Optuna, Hyperopt, and Nevergrad.
Overview#
Ray Tune is the HPO library that comes with Ray, the distributed Python framework. Its design centre is large-scale parallel hyperparameter search — Tune treats each trial as a Ray task or actor, lets the Ray scheduler pack trials onto available CPUs/GPUs across a cluster, and provides built-in checkpointing for trial migration and restart.
Its native search algorithms include ASHA (Asynchronous Successive Halving — efficient early-stopping), BOHB (Bayesian + Hyperband), and PBT (Population-Based Training, which mutates hyperparameters during training rather than between trials). For Bayesian sampling, Tune wraps Optuna, Hyperopt, BayesOpt, and similar libraries.
Mechanism#
A Tune run constructs a search space and a trainable. The trainable is either a function (run once per trial) or a Trainable class (with explicit `step`, `save_checkpoint`, `load_checkpoint`). The search algorithm proposes trial configs; the scheduler (ASHA, PBT, etc.) decides which trials to keep running and which to stop early; Ray's autoscaler can spin up cluster capacity to absorb the workload.
PBT is the unusual one. Rather than running independent trials, PBT maintains a population that periodically copies the best member's weights and perturbs their hyperparameters. The result is a single 'super-trial' whose hyperparameter schedule was learned during training — useful for RL and any setting where the right learning rate changes over the course of training.
Performance Characteristics#
- Trial scheduling: scales to thousands of concurrent trials on a Ray cluster.
- ASHA: 4-10× faster than random or grid search at finding strong configs, with provable competitive guarantees.
- PBT: comparable wall-clock to a single training run, but with HPO baked in.
- Overhead: Ray's actor / task scheduling adds tens to hundreds of milliseconds per trial start — irrelevant for full training jobs.
When to Use#
Pick Ray Tune when your HPO budget spans multiple nodes, when you want PBT specifically, or when you already use Ray for training/serving (RLlib, Ray Train, Ray Serve). For single-node Python-only HPO, Optuna is lighter weight. For large industrial HPO, both Tune and Optuna are credible choices; the deciding factor is usually whether you want Ray as your cluster framework.
Pitfalls#
- Ray's resource model (CPU/GPU/memory fractions) needs explicit declaration — undersubscribed trials waste capacity.
- PBT's exploit-then-explore cycle interacts oddly with learning-rate schedulers; pause schedulers between exploits.
- ASHA early-stops aggressively — pair with `min_t` to avoid killing slow-to-converge configs.
- Distributed Ray setup adds operational overhead; for a 1-machine setup the value is lower.
Software#
- github.com/ray-project/ray — Tune ships as `ray.tune`.
- Integrations: PyTorch Lightning, HuggingFace Trainer, Keras, XGBoost.
- Schedulers: ASHA, HyperBand, PBT, PB2 (PBT + Bayesian).
- Search algorithms: Optuna, Hyperopt, Ax, Nevergrad, ZOOpt.
- Ray Train + Ray Serve cover the rest of the training/serving stack.
References
- Tune: A Research Platform for Distributed Model Selection and Training · arXiv (Liaw et al., 2018)
- Ray Tune documentation · Anyscale
- Population Based Training of Neural Networks · arXiv (Jaderberg et al., 2017)