Azure ML Studio: get prediction probability
In this post we are going to create a sample experiment in Azure Machine Learning studio and build a web service upon this, that allows getting the prediction probability. As usual all the code and experiment materials are available on-line and are free to be downloaded by Rookie's reader. Up we go!
Why may this be useful?
In certain cases one may want to filter out bad responses or responses with low certainty, especially when the model is deployed in production. Although Azure ML Studio is considered as a "toy" solution for data scientists, it has a huge advantage over other platforms that is the ability to create a stable RESTful API based on your prediction experiment in a couple of clicks.
In most of the cases your deployment experiment looks like this:
Your score model output contains the probabilities for each class, and the prediction label column is the class with the highest probability value.
But what if you needed to see the class probability and on your client side introduce an arbitrary threshold to filter out weak probabilities? Fortunately, we can reorder some experiment components and with the help of Python script finally get the needed output.
First we need to change the Select columns module (you may have noticed that there is "Project Columns" on the image, it is just the old name for this component) and instead of selecting all the label we need to exclude all the features.
This is needed to keep all the probabilities and later sort them in a descending order. Then, we need to add Python scripting module and put the following code inside.
import pandas as pd def azureml_main(dataframe1 = None, dataframe2 = None): dataframe1['Probability'] = dataframe1.max(axis=1, numeric_only = True) output = dataframe1[['Probability']] return output
This code snippet looks for all the numeric fields and sort them in a descending order. The highest value is the probability of a scored label.
However, we need to bind this value to the scored so we should add another "Select columns" module and select the scored label.
Finally, it is necessary to bind this two values in a single dataframe.
Here we are, the only thing left is to add the web output and deploy as a web service.
Hope this was useful. Enjoy