Skip to content

Latest commit

 

History

History
470 lines (332 loc) · 39.6 KB

ch10.asciidoc

File metadata and controls

470 lines (332 loc) · 39.6 KB

How Ray powers Machine Learning

You now have a solid grasp of everything in Ray needed to get your data ready to train ML models. In this chapter, you will learn how to use the following popular Ray libraries - Scikit-learn, XGBoost, and Pytorch. This chapter is not intended to introduce these libraries, so if you aren’t familiar with any of them you should pick one (and we suggest Scikit-learn) to read up on first. Even for those familiar with these libraries, refreshing your memory by consulting your favorite tools' documentation will be beneficial. This chapter is about how Ray is used to power machine learning, rather than a tutorial on machine learning.

Note

If you are interested in going deeper into machine learning with Ray, "Learning Ray" is a full-length book focused on machine learning with Ray where you can expand your ML skillset.

Ray has two built-in libraries for machine learning. You will also learn how to use Ray’s Reinforcement learning library - RLlIB with Tensorflow and generic Hyperparameter Tuning - Tune that can be used with any ML library.

Using Scikit-Learn with Ray

Scikit-Learn is one of the most widely used tools in the ML community, offering dozens of easy-to-use machine learning algorithms. It was initially developed by David Cournapeau as a Google Summer of Code project in 2007. It provides a wide range of supervised and unsupervised learning algorithms via a consistent interface.

Scikit-learn’s ML algorithms include:

  • Clustering: for grouping unlabeled data such as KMeans.

  • Supervised Models: a large amount, including generalized linear models, discriminant analysis, naive Bayes, lazy methods, neural networks, support vector machines, and decision trees.

  • Ensemble methods: for combining the predictions of multiple supervised models.

Scikit-Learn also contains important tooling to support machine learning:

  • Cross-Validation: for estimating the performance of supervised models on unseen data.

  • Datasets: for test datasets and for generating datasets with specific properties for investigating model behavior.

  • Dimensionality Reduction: for reducing the number of attributes in data for summarization, visualization, and feature selection such as Principal component analysis.

  • Feature extraction: for defining attributes in image and text data.

  • Feature selection: for identifying meaningful attributes from which to create supervised models.

  • Parameter Tuning: for getting the most out of supervised models.

  • Manifold Learning: For summarizing and depicting complex multi-dimensional data.

Although you can use most of the Scikit-Learn APIs directly with Ray for tuning the model’s hyperparameters, things get a bit more involved when you want to parallelize execution.

If we take the basic code used for the creation of the model in [ch07], and try to optimize parameters for the decision tree, our code will look as in Using Scikit-Learn to build wine quality model:

Example 1. Using Scikit-Learn to build wine quality model
link:examples/ray_examples/ml/sklearn/wine-quality.py[role=include]

Note that here, in the _GridSearchCV_[1], we are using parameter n_jobs=-1 which instructs the implementation to run model evaluation in parallel using all available processors. Running model evaluation in parallel, even on a single machine, can result in an order of magnitude performance improvement.

Unfortunately, this does not work out of the box with Ray clusters. GridSearchCV, and many other Scikit-Learn algorithms, use Joblib for parallel execution. Joblib does not work with Ray out of the box.

Ray implements a backend for joblib with Ray actors pool (see [ch04]) instead of local processes. This allows you to simply change the joblib backend to switch scikit learn from using local processes to Ray.

Concretely, to make the above code Using Scikit-Learn to build wine quality model run using Ray[2] you need to register Ray backend for joblib and use it for the GridSearchCV, execution as in Using Ray joblib backend with Scikit-Learn to build wine quality model.

Example 2. Using Ray joblib backend with Scikit-Learn to build wine quality model
link:examples/ray_examples/ml/sklearn/wine-quality.py[role=include]
Using the Ray joblib backend for Scikit-Learn

This code works, but if we compare the execution time for the default and Ray backends, you can see that using the Ray backend is slower in our example[3]. see that execution with default backend. To understand this difference, you need to return back to ch3, which explains an overhead that is incurred when using Ray’s remote functions. This result basically reemphasizes that Ray remote execution is advantageous only when remote execution takes enough time to offset such an overhead, which is not the case for this toy example.

The advantage of Ray’s implementation starts to grow as the sizes of the model, data, and cluster grow.

Using boosting algorithms with Ray

Boosting algorithms are well suited to parallel computing as they train multiple models. You can train each sub-model independently and then train another model on how to combine the results. The two most popular boosting libraries today are

  • XGBoost - an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solves many data science problems in a fast and accurate way. The same code runs on many distributed environments (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.

  • LightGBM - a fast, distributed, high-performance gradient boosting framework based on a decision tree algorithm, used for ranking, classification, and many other machine learning tasks.

We will compare how Ray parallelizes training with XGBoost and LightGBM, but comparing the details of the libraries is beyond the scope of this book. If you’re interested in the difference between the libraries, a good comparison is found in this great blog post.

Using XGBoost

Continuing with our wine quality example, we build a model using XGBoost and the code to do so is presented in Using XGBoost to build wine quality model.

Example 3. Using XGBoost to build wine quality model
link:examples/ray_examples/ml/boosting/wine-quality-xgboost.py[role=include]

One of the reasons XGBoost is so performant is that it uses openMP to create tree’s branches independently, which does not directly support Ray. Ray integrates with XGBoost by providing an xgboost-ray library that replaces OpenMP with Ray actor pools. You can use this library either with XGBoost or scikit-learn APIs. In the latter case, the library provides a drop-in replacement for the following estimators:

  • RayXGBClassifier

  • RayXGRegressor

  • RayXGBRFClassifier

  • RayXGBRFRegressor

  • RayXGBRanker

It also provides RayParams that allows you to explicitly define the execution parameters for Ray. Using this library we can modify our code Using XGBoost to build wine quality model to make it work with Ray as follows in Using XGBoost Ray library to build wine quality model.[4]

Example 4. Using XGBoost Ray library to build wine quality model
link:examples/ray_examples/ml/boosting/wine-quality-xgboost_ray.py[role=include]

Here we used rayParams to specify the size of Ray’s actor pool used for parallelization. Alternatively, you can use the n_jobs parameter in RayXGBClassifier to achieve the same.

Using LightGBM

Building wine quality model using LightGBM is presented in Using LightGBM to build wine quality model.

Example 5. Using LightGBM to build wine quality model
link:examples/ray_examples/ml/boosting/wine-quality-lightgbm.py[role=include]

Similar to XGBoost, LightGBM is using OpenMP for parallelization. As a result, Ray offers the Distributed LightGBM on Ray library, which implements parallelization using Ray’s actor pool. Similar to the xgboost-ray library, this library supports both native and scikit-learn APIs. In the latter case the library implements the following estimators:

  • RayLGBMClassifier

  • RayLGBMRegressor

As with XGBoost, RayParams is provided, allowing you to define execution parameters for Ray. Using this library we can modify our code Using LightGBM to build wine quality model to make it work with Ray as in Using LightGBM Ray library to build wine quality model.[5]

Example 6. Using LightGBM Ray library to build wine quality model
link:examples/ray_examples/ml/boosting/wine-quality-lightgbm_ray.py[role=include]

Here we used rayParams to specify the size of Ray’s actor pool used for parallelization. Alternatively, you can use the n_jobs parameter in RayLGBMClassifier to achieve the same.

Using the Ray boosting libraries

This code works, but if we compare the execution time using OpenMP and Ray Actor pool, we will see that usage of OpenMP for our toy example is much faster.[6] Similar to the case of Scikit-learn this is due to the overhead of remoting and changes as data & model sizes grow.

Using PyTorch with Ray

Another very popular machine learning framework is PyTorch, an open-source Python library for deep learning developed and maintained by Facebook.

PyTorch is simple and flexible, making it a favorite for many academics and researchers in the development of new deep learning models and applications. Many extensions for specific applications (such as text, computer vision, and audio data) have been implemented for PyTorch. There are also a lot of pre-trained models that you can use directly. If you are not familiar with PyTorch, take a look at this tutorial for an introduction to its structure, capabilities, and usage for solving different problems.

We will continue with our wine quality problem and show how to use PyTorch to build a Multilayer Perceptrons (MLP) model for predicting wine quality. To do this you need to start from creating a custom PyTorch Dataset class that can be extended and customized to load your dataset. For our wine quality example the custom dataset class looks as follows in PyTorch dataset class for loading wine quality data.

Example 7. PyTorch dataset class for loading wine quality data
link:examples/ray_examples/ml/torch/wine-quality_torch.py[role=include]

Note that here, in addition to the minimum requirements, we have implemented an additional method get_splits, that splits an original dataset into 2: one for training and one for testing.

Once you have your data class defined, you can use PyTorch to make a model. To define a model in PyTorch you extend the base PyTorch Module class. Model class for our purposes is presented in PyTorch model class for wine quality.

Example 8. PyTorch model class for wine quality
link:examples/ray_examples/ml/torch/wine-quality_torch.py[role=include]

This class constructor builds the model - defines the layers of the model and their connectivity and the forward() method defines how to forward propagate input through the model.

With these two classes in place, the overall code[7] looks like PyTorch implementation of wine quality model building.

Example 9. PyTorch implementation of wine quality model building
link:examples/ray_examples/ml/torch/wine-quality_torch.py[role=include]

The code in PyTorch implementation of wine quality model building works, but Ray is integrated with Lightning PyTorch, not PyTorch. Lightning structures your PyTorch code so it can abstract the details of training. This makes AI research scalable and fast to iterate on.

To convert our PyTorch code PyTorch implementation of wine quality model building to lightning PyTorch, we first need to modify the model PyTorch model class for wine quality. In Lightning PyTorch it needs to be derived from lightning_module not module, which means that we need to add 2 methods to our model (Lightning PyTorch model additional functions for wine quality).

Example 10. Lightning PyTorch model additional functions for wine quality
link:examples/ray_examples/ml/torch/wine-quality_ltorch.py[role=include]

Here the training_step method defines a single step, while configure_optimized defines which optimizer to use. When you compare this to code PyTorch model class for wine quality you will notice that some code from there is just moved into these two methods (here instead of BCELoss optimizer we are using Adam optimizer). With this updated model class, the model training looks like Lightning PyTorch implementation of wine quality model building.[8]

Example 11. Lightning PyTorch implementation of wine quality model building
link:examples/ray_examples/ml/torch/wine-quality_ltorch.py[role=include]

Note that unlike PyTorch code PyTorch implementation of wine quality model building, where training is implemented programmatically, Lightning PyTorch introduces a trainer class, which internally implements a trainer loop. This approach allows all required optimization to be in the training loop.

Both PyTorch and PyTorch Lightning are using joblib to distributed training through the built-in ddp_cpu backend or more generally Horovod. As with other libraries, to allow distributed PyTorch Lightning on Ray, Ray has a library Distributed PyTorch Lightning Training that adds new PyTorch Lightning plugins for distributed training using Ray. These PyTorch Lightning Plugins on Ray allow you to quickly and easily parallelize training while still getting all the benefits of PyTorch Lightning and using your desired training protocol, either ddp_cpu or Horovod.

Once you add the plugins to your PyTorch Lightning Trainer, you can configure them to parallelize training to all the cores in your laptop, or across a massive multi-node, multi-GPU cluster with no additional code changes. This library also comes with integration with Ray Tune so you can perform distributed hyperparameter tuning experiments.

The RayPlugin class provides Distributed Data-Parallel (DDP) training on a Ray cluster. PyTorch DDP is used as the distributed training protocol by PyTorch, and Ray is used in this case to launch and manage the training worker processes. The base code using this plugin is shown in Enabling Lightning PyTorch implementation of wine quality model building to run on Ray.[9]

Example 12. Enabling Lightning PyTorch implementation of wine quality model building to run on Ray
link:examples/ray_examples/ml/torch/wine-quality_ltorch_ray.py[role=include]

The two additional plugins included in the library include:

  • HorovodRayPlugin, which integrates with Horovod as the distributed training protocol.

  • RayShardedPlugin, which integrates with FairScale to provide sharded DDP training on a Ray cluster. With sharded training, you can leverage the scalability of data-parallel training while drastically reducing memory usage when training large models.

Usage of the Distributed PyTorch Lightning Training

This code works, but if we compare the execution time using all three implementations, we will see that usage of PyTorch for our toy example takes 16.6 sec, Lightning PyTorch takes 8.2 sec and distributed Lightning Pytorch with Ray takes 25.2 sec. Similar to the previous two cases – Scikit-learn and Boosting algorithms – this is due to the overhead of remoting.

Reinforcement Learning with Ray

Ray was initially created as a platform for Reinforcement Learning(RL), which is one of the hottest research topics in the field of modern Artificial Intelligence and its popularity is only growing. RL is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.

spwr 1001
Figure 1. Different types of machine learning

Both supervised and reinforcement learning create the mapping between input and output, but unlike supervised learning which is using a set of known inputs and output for training, reinforcement learning uses rewards and punishments as signals for positive and negative behavior. Both unsupervised and reinforcement learning leverage experiment data, but they have different goals. While in unsupervised learning we are finding similarities and differences between data points, in the case of reinforcement learning we are trying to find a suitable action model that would maximize the total cumulative reward - and improve the model.

spwr 1002
Figure 2. Reinforcement model implementation

The key components of an RL implementation are (Reinforcement model implementation):

  1. Environment - Physical world in which the agent operates

  2. State - Current state of agent

  3. Reward - Feedback to the agent from the environment

  4. Policy - Method to map agent’s state to actions

  5. Value - Future reward that an agent would receive by taking an action in a particular state

Reinforcement learning is a huge topic and its details are beyond the scope of this book (we are just trying to explain how to start using the library with a simple example), but if you are interested in learning more about it, this blog post is an excellent starting point.

Ray’s RLlib is a library for RL, which allows for production-level, highly distributed RL workloads while providing unified and simple APIs for a large variety of applications for different industries.

It supports both model-free and model-based reinforcement learning.

spwr 1003
Figure 3. RLlib components

As shown in RLlib components, RLlib is built on top of Ray and offers off-the-shelf, highly distributed algorithms, policies, loss functions, and default models.

Policy encapsulates the core numerical components of RL algorithms. A policy includes a policy model that determines actions based on environment changes and a loss function defining the result of the action based on the post-processed environment. Depending on the environment, RL can have a single agent and property, a single policy for multiple agents, or multiple policies, each controlling one or more agents.

Everything agents interact with is called an environment. The environment is the outside world and it comprises everything outside the agent.

Environments types

There are different types of environment:

  • Deterministic environment is an environment where the outcome is known based on the current state.

  • A stochastic environment is an environment where the outcome is uncertain based on the current state.

  • A fully observable environment is an environment where an agent can determine the state of the system at all times.

  • A partially observable environment is an environment where an agent cannot determine the state of the system at all times.

  • A discrete environment is an environment where only a finite state of actions is available for moving from one state to another.

  • A continuous environment is an environment where there is an infinite state of actions available for moving from one state to another..

  • Episodic and non-episodic environment. In an episodic environment, an agent’s current action will not affect a future action, whereas, in a non-episodic environment, an agent’s current action will affect future action.

  • Single and multi-agent environment. A single-agent environment has only a single agent and a multi-agent environment has multiple agents.

Given an environment and policy, policy evaluation is done by the worker. RLlib provides a RolloutWorker class that is used in most RLlib algorithms.

At a high level, RLlib provides trainer classes that hold a policy for environment interaction. Through the trainer interface, the policy can be trained, checkpointed, or an action computed. In multi-agent training, the trainer manages the querying and optimization of multiple policies at once. The trainer classes coordinate the distributed workflow of running rollouts and optimizing policies. They do this by leveraging Ray parallel iterators.

Beyond environments defined in Python, Ray supports batch training on offline datasets through input readers. This is an important use case for RL when it’s not possible to run traditional training and rollout in a physical environment (like a chemical plant or assembly line) and a suitable simulator doesn’t exist. In this approach, data for past activity is used to train a policy.

From single processes to large clusters, all data interchange in RLlib uses sample batches. Sample batches encode one or more fragments of data. Typically, RLlib collects batches of size rollout_fragment_length from rollout workers and concatenates one or more of these batches into a batch of size train_batch_size that is the input to SGD (stochastic gradient descent).

The main features of RLlib are:

  • Support for the most popular deep-learning frameworks including PyTorch and Tensorflow

  • Implementation of highly distributed learning, RLlib algorithms (PPO or Impala ) allow you to set the num_workers config parameter, such that your workloads can run on 100s of CPUs/nodes thus parallelizing and speeding up learning.

  • Support for Multi-agent RL allows for training your agents supporting any of the following strategies:

  • Support APIs for an external pluggable simulators environment which comes with a pluggable, off-the-shelve client/ server setup that allows you to run 100s of independent simulators on the “outside” connecting to a central RLlib Policy-Server that learns and serves actions.

Additionally, RLlib provides simple APIs to customize all aspects of your training and experimental workflows. For example, you may code your own environments in python using openAI’s gym or DeepMind’s OpenSpiel, provide custom TensorFlow/Keras or, Torch models, and write your own policy- and loss definitions, or define custom exploratory behavior.

Simple code for implementing RL training to address the inverted pendulum (CartPole) problem (the environment exists in OpenApi gym) is shown in Cart pole reinforcement learning.[10]

Example 13. Cart pole reinforcement learning
link:examples/ray_examples/ml/rl/simpleRL.py[role=include]

The code Cart pole reinforcement learning starts by creating a configuration for a trainer. The configuration defines an environment (here we use an existing OpenApi gym environment, so we can just use its name), the number of workers (we use 4), framework (we use Tensorflow 2), model, train batch size, and additional execution parameters. This configuration is used for the creation of the trainer. We then execute several training iterations and display results. That’s all it takes to implement a simple RL.

You can easily extend this simple example by creating your specific environment or introducing your own algorithms.

Numerous examples of Ray RLlIB usage are described here.

Hyperparameter tuning with Ray

When creating a machine learning model you are often faced with a variety of choices, from the type of model to feature selection techniques, and so on. A natural extension of machine learning is to use similar techniques to find the right values (or parameters) for the choices in building our model. Parameters that define the model architecture are referred to as hyperparameters and the process of searching for the ideal model architecture is referred to as hyperparameter tuning. Unlike the model parameters that specify how to transform the input data into the desired output, the hyperparameters define how to structure the model.

Like with boosting algorithms, hyperparameter tuning is especially well suited to parallelization as it involves training and comparing many models. Depending on the search technique, training these separate models can be an "embarrassingly parallel" problem as there is little to no communication needed between them.

Popular Hyperparameter tuning approaches

The most popular hyperparameter tuning methods are:

  • Grid search. This is the most basic one. In this case, a model is built for each possible combination of provided hyperparameter values. Every model is evaluated for given criteria and the one producing the best result is selected.

  • Random search. Unlike Grid Search, which uses a discrete set of hyperparameter values, Random search leverages a statistical distribution for each hyperparameter from which values may be randomly sampled. This approach defines the number of iterations for search and for each iteration hyperparameter values are picked by sampling-defined statistical distribution

  • Bayesian optimization. In the above methods, individual experiments are building models for different parameter’s hyperparameter values. The advantage of that approach is that such experiments are embarrassingly parallel and can be executed very efficiently. The disadvantage of such an approach is that it does not take advantage of information from previous experiments. Bayesian optimization belongs to a class of sequential model-based optimization (SMBO) algorithms that use the results of the previous iteration to improve the sampling method of the next iteration. This approach builds a probability model of the objective function and uses it to select the most promising hyperparameters to evaluate.

Some examples of hyperparameters include:

  • The degree of the polynomial feature that should be used for the linear model.

  • Maximum depth allowed for decision tree.

  • The minimum number of samples required at a leaf node in a decision tree.

  • The number of neurons for a neural network layer.

  • The number of layers for a neural network.

  • Learning rate for gradient descent.

Ray Tune is the Ray-based native library for hyperparameter tuning. The main features of Tune are:

  • It provides distributed asynchronous optimization out of the box leveraging Ray.

  • The same code can be scaled from a single machine to a large distributed cluster.

  • It offers state-of-the-art algorithms including (but not limited to) ASHA, BOHB, and Population-Based Training.

  • It integrates with TensorBoard or MLFlow to visualize tuning results.

  • It integrates with many optimization libraries such as Ax/Botorch, HyperOpt, and Bayesian Optimization and enables their transparently scaling.

  • It supports many machine learning frameworks, including PyTorch, TensorFlow, XGBoost, LightGBM, and Keras.

The main components of Tune are:

  • Trainable - a training function, with an objective function. Tune offers two interface APIs for trainable: functional and class.

  • Search space - valid values for your hyperparameters and can specify how these values are sampled (e.g. from a uniform distribution or a normal distribution). Tune offers various functions to define search spaces and sampling methods.

  • Search algorithm - an algorithm used for the optimization of hyperparameters. Tune has SearchAlgorithms that integrate with many popular optimization libraries, such as Nevergrad and HyperOpt. Tune automatically converts the provided search space into the search spaces the search algorithms/underlying library expect.

  • Trial - execution or run of a logical representation of a single hyperparameter configuration. Each trial is associated with an instance of a Trainable. And a collection of trials comprise an experiment. Tune uses Ray Actors as worker node’s processes to run multiple trials in parallel.

  • Experiment Analysis - an object, returned by Tune, which has methods that can be used for analyzing your training. It can be integrated with TensorBoard and MLFlow for results visualization.

To show how to use Tune, let’s optimize our PyTorch implementation of wine quality model building PyTorch model class for wine quality. We will try to optimize 2 parameters of the optimizer used to build the model - lr and momentum.

Meaning of the parameters we are optimizing

Lr stands for learning rate. Deep learning neural networks are trained using the stochastic gradient descent (SGD) algorithm. This algorithm estimates the error gradient for the current state of the model using examples from the training dataset, then updates the weights of the model using the back-propagation of errors algorithm, referred to as simply backpropagation.

The amount that the weights are updated during training is referred to as the step size or the “learning rate.”

Momentum is a hyperparameter of the SGD algorithm. It is an exponentially weighted average of the prior updates to the weight that can be included when the weights are updated. This change to stochastic gradient descent is called “momentum” and adds inertia to the update procedure, causing many past updates in one direction to continue in that direction in the future.

Example 14. Implementing support functions for Pytorch winequality model implementation
link:examples/ray_examples/ml/tune/wine-quality_tune.py[role=include]

In the code above we have introduced 3 supporting functions:

  • Model_train - this function encapsulates model training

  • Model_test - this function that encapsulates model quality evaluation

  • Train_winequality - this function implements all steps for model training and reporting them to Tune. This allows Tune to make decisions in the middle of training.

With these three functions in place, integration with Tune is very straightforward (Integration of model building with Tune):

Example 15. Integration of model building with Tune
link:examples/ray_examples/ml/tune/wine-quality_tune.py[role=include]

After loading the dataset, the code defines a search space - space for possible hypermarameters and invokes tuning using Tune run method. The parameters here are:

  • Callable defining a training function (train_winequaliy in our case)

  • Num_samples - maximum number of runs for Tune

  • Scheduler - here we are using use ASHA, a scalable algorithm for principled early stopping. To make optimization process more efficient, ASHA scheduler terminates trials that are less promising and allocates more time and resources to more promising trials.

  • Config - contains the search space for the algorithm

Running of the code above produces the following result (Tuning model result):

Example 16. Tuning model result
+-------------------------------+------------+-----------------+-------------+------------+----------+--------+------------------+

| Trial name | status | loc | lr | momentum | acc | iter | total time (s) |

|-------------------------------+------------+-----------------+-------------+------------+----------+--------+------------------|

| train_winequality_8f6ab_00000 | TERMINATED | 127.0.0.1:29952 | 2.84411e-07 | 0.170684 | 0.513258 | 50 | 4.6005 |

| train_winequality_8f6ab_00001 | TERMINATED | 127.0.0.1:29943 | 4.39914e-10 | 0.562412 | 0.530303 | 1 | 0.0829589 |

| train_winequality_8f6ab_00002 | TERMINATED | 127.0.0.1:29944 | 5.72621e-06 | 0.734167 | 0.587121 | 16 | 1.2244 |

| train_winequality_8f6ab_00003 | TERMINATED | 127.0.0.1:29947 | 0.104523 | 0.316632 | 0.729167 | 50 | 3.83347 |

……………………………..

| train_winequality_8f6ab_00037 | TERMINATED | 127.0.0.1:30049 | 5.87006e-09 | 0.566372 | 0.625 | 4 | 2.41358 |

|| train_winequality_8f6ab_00043 | TERMINATED | 127.0.0.1:30069 | 0.000225694 | 0.567915 | 0.50947 | 1 | 0.130516 |

| train_winequality_8f6ab_00044 | TERMINATED | 127.0.0.1:30052 | 2.01545e-07 | 0.525888 | 0.405303 | 1 | 0.208055 |

| train_winequality_8f6ab_00045 | TERMINATED | 127.0.0.1:30054 | 1.84873e-07 | 0.150054 | 0.583333 | 4 | 2.47224 |

| train_winequality_8f6ab_00046 | TERMINATED | 127.0.0.1:30070 | 0.136969 | 0.567186 | 0.742424 | 50 | 4.52821 |

| train_winequality_8f6ab_00047 | TERMINATED | 127.0.0.1:30071 | 1.29718e-07 | 0.659875 | 0.443182 | 1 | 0.0634422 |

| train_winequality_8f6ab_00048 | TERMINATED | 127.0.0.1:30072 | 0.00295002 | 0.349696 | 0.564394 | 1 | 0.107348 |

| train_winequality_8f6ab_00049 | TERMINATED | 127.0.0.1:30080 | 0.363802 | 0.290659 | 0.725379 | 4 | 0.227807 |

+-------------------------------+------------+-----------------+-------------+------------+----------+--------+------------------+

As you can see from the Tuning model result although we have defined 50 iterations for model search, using ASHA significantly improves performance because ASHA uses significantly fewer runs on average (in this example more than 50% only used 1 iteration).

Conclusion

In this chapter you have learned how Ray constructs are leveraged for scaling execution of the different ML libraries (Scitit-Learn, XGBoost, LightGBM and LightPytorch) using the full capabilities of multi-machine Ray clusters.

We showed you some of the very simple examples of porting your existing ML code to Ray, as well as some of the internals of how Ray extends ML libraries to scale. We also showed very simple examples of using Ray-specific implementations of Reinforcement learning and Hyperparameter tuning.

We hope that looking at these relatively simple examples will give you a better idea of how to best use Ray in your day-to-day implementations.


1. In this example we are using GridSearchCV, which implements an exhaustive search. Although this works for this simple example, Scikit-learn currently provides a new library Tune-sklearn, that provides more powerful tune algorithms providing a significant tuning speed up. This said, the same joblib backend works for these algorithms the same way.
2. Complete code for this implementation can be found on github
3. During testing we saw 8.2s with Joblib while with Ray we saw 25.1 s
4. The full code for this implementation can be found on github.
5. The full code for this implementation can be found on github.
6. In our testing, for XGBoost it is 0.15 vs 14.4 sec and for LightGBM it is 0.24 vs 12.4 sec.
7. The complete code is in github.
8. The full code is in github.
9. The full code is in github.
10. Complete code for this implementation is in github.
11. Complete code for this implementation is in github.