copyright | lastupdated | ||
---|---|---|---|
|
2019-03-07 |
{:java: #java .ph data-hd-programlang='java'} {:swift: #swift .ph data-hd-programlang='swift'} {:ios: #ios data-hd-operatingsystem="ios"} {:android: #android data-hd-operatingsystem="android"} {:shortdesc: .shortdesc} {:new_window: target="_blank"} {:codeblock: .codeblock} {:screen: .screen} {:tip: .tip} {:pre: .pre}
{: #create-deploy-retrain-machine-learning-model} This tutorial walks you through the process of building a predictive machine learning model, deploying it as an API to be used in applications, testing the model and retraining the model with feedback data. All of this happening in an integrated and unified self-service experience on IBM Cloud.
In this tutorial, the Iris flower data set is used for creating a machine learning model to classify species of flowers.
In the terminology of machine learning, classification is considered an instance of supervised learning, i.e. learning where a training set of correctly identified observations is available. {:tip}
{:shortdesc}
![](images/solution22-build-machine-learning-model/architecture_diagram.png)
{: #objectives}
- Import data to a project.
- Build a machine learning model.
- Deploy the model and try out the API.
- Test a machine learning model.
- Create a feedback data connection for continuous learning and model evaluation.
- Retrain your model.
{: #services}
This tutorial uses the following runtimes and services:
- {{site.data.keyword.DSX_short}}
- {{site.data.keyword.sparkl}}
- {{site.data.keyword.cos_full_notm}}
- {{site.data.keyword.pm_full}}
- {{site.data.keyword.dashdblong}}
{: #prereqs}
-
IBM Watson Studio and Watson Knowledge Catalog are applications that are part of IBM Watson. To create an IBM Watson account, begin by signing up for one or both of these applications.
Go to Try IBM Watson and sign up for IBM Watson apps.
{:#import_data_project}
A project is how you organize your resources to achieve a particular goal. Your project resources can include data, collaborators, and analytic tools like Jupyter notebooks and machine learning models.
You can create a project to add data and open a data asset in the data refiner for cleansing and shaping your data.
Create a project:
- Go to the {{site.data.keyword.Bluemix_short}} catalog and select {{site.data.keyword.DSX_short}} under the AI section. Create the service. Click on the Get Started button to launch the {{site.data.keyword.DSX_short}} dashboard.
- Create a project > Click Create Project on Standard tile. Add a name say
iris_project
and optional description for the project. - Leave the Restrict who can be a collaborator checkbox unchecked as there's no confidential data.
- Under Define Storage, Click on Add and choose an existing Cloud Object Storage service or create a new one (Select Lite plan > Create). Hit Refresh to see the created service.
- Click Create. Your new project opens and you can start adding resources to it.
Import data:
As mentioned earlier, you will be using the Iris data set. The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. This small dataset is often used for testing out machine learning algorithms and visualizations. The aim is to classify Iris flowers among three species (Setosa, Versicolor or Virginica) from measurements of length and width of sepals and petals. The iris data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.
Download iris_initial.csv which consists of 40 instances of each class. You will use the rest 10 instances of each class to re-train your model.
- Under Assets in your project, click the Find and Add Data icon .
- Under Load, click on browse and upload the downloaded
iris_initial.csv
. - Once added, you should see
iris_initial.csv
under the Data assets section of the project. Click on the name to see the contents of the data set.
{:#associate_services}
- Under Settings, scroll to Associated services > click Add service > choose Spark.
- Select Lite plan and click Create. Use the default values and click Confirm.
- Click Add Service again and choose Watson. Click Add on Machine Learning tile > choose Lite plan > click Create.
- Leave the default values and click Confirm to provision a Machine Learning service.
{:#build_model}
-
Click Add to project and select Watson Machine Learning model. In the dialog, add iris_model as name and an optional description.
-
Under Machine Learning Service section, you should see the Machine Learning service you associated in the above step.
-
Select Model builder as your model type and Under Spark Service or Environment section, Choose the spark service you created earlier
-
Select Manual to manually create a model. Click Create.
For the automatic method, you rely on automatic data preparation (ADP) completely. For the manual method, in addition to some functions that are handled by the ADP transformer, you can add and configure your own estimators, which are the algorithms used in the analysis. {:tip}
-
On the next page, select
iris_initial.csv
as your data set and click Next. -
On the Select a technique page, based on the data set added, Label columns and feature columns are pre-populated. Select species (String) as your Label Col and petal_length (Decimal) and petal_width (Decimal) as your Feature columns.
-
Choose Multiclass Classification as your suggested technique.
-
For Validation Split configure the following setting:
Train: 50%, Test 25%, Holdout: 25%
-
Click on Add Estimators and select Decision Tree Classifier, then Add.
You can evaluate multiple estimators in one go. For example, you can add Decision Tree Classifier and Random Forest Classifier as estimators to train your model and choose the best fit based on the evaluation output. {:tip}
-
Click Next to train the model. Once you see the status as Trained & Evaluated, click Save.
-
Click on Overview to check the details of the model.
{:#deploy_model}
-
Under the created model, click on Deployments > Add Deployment.
-
Choose Web Service. Add a name say
iris_deployment
and an optional description. -
Click Save. On the overview page, click on the name of the new web service. Once the status is DEPLOY_SUCCESS, you can check the scoring-endpoint, code snippets in various programming languages, and API Specification under Implementation.
-
Click on View API Specification to see and test {{site.data.keyword.pm_short}} API endpoints.
To start working with the API, you need to generate an access token using the username and password available on the Service Credentials tab of the {{site.data.keyword.pm_short}} service instance under {{site.data.keyword.Bluemix_short}} Resource List . Follow the instructions mentioned on the API specification page to generate an access token. {:tip}
-
To make an online prediction, use the
POST /online
API call.instance_id
can be found on the Service Credentials tab of the {{site.data.keyword.pm_short}} service under {{site.data.keyword.Bluemix_short}} Resource List.deployment_id
andpublished_model_id
are under Overview of your deployment.- For
online_prediction_input
, use the below JSON
{ "fields": ["sepal_length", "sepal_width", "petal_length", "petal_width"], "values": [ [5.1, 3.5, 1.4, 0.2] ] }
- Click on Try it out to see the JSON output.
-
Using the API endpoints, you can now call this model from any application.
{:#test_model}
- Under Test, you should see input data (Feature data) being populated automatically.
- Click Predict and you should see the Predicted value for species in a chart.
- For JSON input and output, click on the icons next to the active input and output.
- You can change the input data and continue testing your model.
{:#create_feedback_connection}
-
For continuous learning and model evaluation, you need to store new data somewhere. Create a {{site.data.keyword.dashdbshort}} service > Entry plan which acts as our feedback data connection.
-
On the {{site.data.keyword.dashdbshort}} Manage page, click Open. On the top navigation, select Load.
-
Click on browse files under My computer and upload
iris_initial.csv
. Click Next. -
Select DASHXXXX, e.g., DASH1234 as your Schema and then click on New Table. Name it
IRIS_FEEDBACK
and click Next. -
Datatypes are automatically detected. Click Next and then Begin Load.
-
A new target DASHXXXX.IRIS_FEEDBACK is created.
You will be using this in the next step where you will be re-training the model for better performance and precision.
{:#retrain_model}
- Return to your {{site.data.keyword.Bluemix_short}} Resource List and under the {{site.data.keyword.DSX_short}} service you have been using, click on Projects > iris_project > iris-model (under assets) > Evaluation.
- Under Performance Monitoring, Click on Configure Performance Monitoring.
- On the configure Performance Monitoring page,
- Select the Spark service. Prediction type should be populated automatically.
- Choose weightedPrecision as your metric and set
0.98
as the optional threshold. - Click on Create new connection to point to the IBM Db2 Warehouse on cloud which you created in the above section.
- Select the Db2 warehouse connection and once the connection details are populated, click Create.
- Click on Select feedback data reference and point to the IRIS_FEEDBACK table and click Select.
- In the Record count required for re-evaluation box, type the minimum number of new records to trigger retraining. Use 10 or leave blank to use the default value of 1000.
- In the Auto retrain box, select one of the following options:
- To start automatic retraining whenever model performance is below the threshold that you set, select when model performance is below threshold. For this tutorial, you will choose this option as our precision is below the threshold (.98).
- To prohibit automatic retraining, select never.
- To start automatic retraining regardless of performance, select always.
- In the Auto deploy box, select one of the following options:
- To start automatic deployment whenever model performance is better than the previous version, select when model performance is better than previous version. For this tutorial, you will choose this option as our aim is to continuosly improve the performance of the model.
- To prohibit automatic deployment, select never.
- To start automatic deployment regardless of performance, select always.
- Click Save.
- Download the file iris_retrain.csv. Thereafter, click Add feedback data, select the downloaded csv file, and click Open.
- Click New evaluation to begin.
- Once the evaluation completes. You can check the Last Evalution Result section for the improved WeightedPrecision value.
{:removeresources}
- Navigate to {{site.data.keyword.Bluemix_short}} Resource List > choose the Location, Org and Space where you have created the services.
- Delete the respective {{site.data.keyword.DSX_short}}, {{site.data.keyword.sparks}}, {{site.data.keyword.pm_short}}, {{site.data.keyword.dashdbshort}} and {{site.data.keyword.cos_short}} services which you created for this tutorial.
{:related}