Custom AI on Azure: Brief overview with examples

Alibek Jakupov
Feb 21, 2019
6 min read

Updated: Nov 19, 2021

Tools available on Azure:

Azure ML, Spark on HDInsight, Azure Databricks, Batch AI, SQL Server Machine Learning Services, Data Science VM, Azure Notebooks, ML Server

Important:

Azure Databricks is the same as Spark
Batch AI is adapted for Big Data but different from Spark as uses very large clusters
SQLServer ML Services allow encapsulating ML in TSQL (for enterprise developers and mainly for data preparation).
Azure Notebooks is good but not for product development
ML Server: R server or Python Server
Data Science Virtual Machine - Ready to go client environment for Data Science. VM image pre-installed and configured with data science tools

Usage modes

3 modes available:

Pre-built AI: models are pre-trained using Microsoft data (e.g. sentiment analysis, keyword extraction, topic detection etc.) aka Cognitive Services
Customized AI: pre-built models customized with user data
Custom models: fully tailored for user scenario and trained with user data

N.B. Cosmos DB is adapted for web scale applications (huge amounts of data). ML Services are used for semantic leveling. Data factory allows running SSIS packages. In case you are using Azure Databricks (Spark ML) you may schedule it as a job.

Important: ML scripting on SQL Server is no longer available.

Agile process for Data Science team

Microsoft Team Data Science process is perfectly adapted to develop, deploy and manage AI applications in the cloud based on DevOps best practices.

The data science lifecycle in TDSP is shown in figure 1

Figure 1 – Data science life cycle (source https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview)

Thus, primary stages are: Business Understanding; Data Acquisition and Understanding; Modeling; Deployment.

After finishing the experiment, you may use your trained model for:

Device application

Serialized model

Desktop application

Serialized model
Containerized model

Web server application

Containerized model

Comparative Analysis

In this section we compare Azure ML Workbench and Azure ML Studio from the point of view of hackathon participant.

Azure ML Studio

Ideal for fast deployment of a web service. All the data science processes run on the virtual machine. With free subscription the developer has up to 10 Gigabyte of storage.

Pros: Easy to use; Fast web service deployment; All the development is done on the cloud – nothing to install; many algorithms are pre-developed and ready to use; ability to run R and Python Scripts; ability to add libraries that are not pre-installed.

Cons: Limited from the algorithmic point of view (if not using custom models); Limited visualization capacity; Not scalable

Azure ML Workbench

Installed on the local machine and allowing to use certain components locally (e.g. Data Preparation) or deploy to a target cluster and cloud as a runtime or execution environment. After training you may Deploy One Docker Container with one scoring Web Service per Model.

Pros: Scalable; Local-based; Rich from the algorithmic point of view; Git integration; Access control; Project roaming and sharing; Ability to run history information

Cons: Needs Azure ML Experimentation Service account to run the experiment (even locally); Needs Workbench installation (may cause difficulties for Linux users as no Linux installation available); Only real-time (request-response) mode for web service available, no batch scoring

Process of training and evaluating a predictive model

Azure ML Studio

All the experiments are organized in the form of data flow (similar to SSIS).

Source

Azure ML Studio allows loading data sets directly on the virtual machine or connect to the distant storage via the import data wizard. Currently the supported options are: Azure Blob storage, Web url via HTTP, Hive Query, Azure SQL Database, Azure table, Data Feed Provider, On-premises SQL Database (Preview feature), Azure DocumentDB.

Data Preparation

There are certain components that are already pre-built to perform data preparation and analysis. If the code already exists on the local storage (Python or R), it may be copied into the special scripting component. The input/output are only in the form of a dataframe.

Important: the module structure is special, so it is necessary to refactor the code as in Python scripting module all the logic is done via a single entry point (function main).

Data preprocessing

Special modules allow to preprocess data, clean noise, move missing values etc. At every point the output may be saved locally.

Training

For training you need to have special modules (it’s possible to train model in python or R modules, but to save it after training you need use training ones). The following train modules are available: Sweep Clustering, Train Anomaly Detection Model, Train Clustering Model, Train Matchbox

Recommender, Train model, Tune Model Hyperparameters (formerly known as sweep parameters).

Evaluation

Before evaluating it is necessary to score the model. The scoring process is the same as usual – we split data into test and evaluation and obtain the overall accuracy. Modules adapted for evaluation are: Cross Validate Model, Evaluate Model, Evaluate Recommender, Evaluate Probability function.

Web service deployment

After saving the trained model, we create an experiment with web service input and output (it is possible to omit one of them but not both). The experiment will only be aimed to define the input and output structure. You then may deploy your model as a web service.

Important: the output is in the Json format (usually it is a prediction score). The batch execution is also supported.

Azure ML Workbench

Prepare data

Data Sources

Files (CSV, TSV, JSON, Fixed Width)
Parquet
Excel file
Database (Azure SQL DB or SQL Server)

Utilizes Program Synthesis using Examples (PROSE) to support various kinds of data wrangling, where data transformations are performed by example instead of explicit code (e.g., regular expressions). Extend/customize transform and featurize through Python. Generate Python and PySpark for execution at scale. Export the data prep data flow as a file, re-use for later training and for scoring. Explore using notebooks.

Train and Test Model

Firstly, you need to load the prepared dataset and split it into test and train subsets. Then choose algorithm and train the model. The training is done using Python script.

Configure Compute Targets

Compute targets provide the resources that run your experiment. First, attach compute target. Then, prepare compute target and reference compute target. When submitting scripts via ML CLI provide the compute target name. When in a Jupyter Notebook, select the Kernel named after the compute target

Figure 2 – Notebook compute targets/kernels

For remote compute targets there are several possibilities

Attach a DSVM as a compute target

az ml computetarget attach remotedocker --name <compute target name> --address <ip or FQDN> --username <admin user> --password <admin password>

Attach an HDInsight cluster as a compute target

az ml computetarget attach cluster --name <compute target name> --address <name-ssh.azurehdinsight.net> --username <sshuser> --password <ssh password>

Prepare the compute target

az ml experiment prepare -c <compute target name>

Verify prepared status of the compute target

az ml experiment prepare -c <compute target name> --check

Evaluate

Measure model performance->Log the model performance->Chart the model performance

Deploy

Use the trained model to make predictions in:

Device application

Serialized model

Desktop application

Serialized model
Containerized model

Web server application

Containerized model

Deploy One Docker Container with one scoring Web Service per Model. Currently only real-time (request-response), no batch scoring. Scoring Web Service Script is implemented in Python with the following methods:

init() - loads the model from the file and registers the data collectors
run(input_dataframe) - prepares the input data, performs the scoring operation and logs both the inputs and prediction results with the data collectors

Text Analytics Example

Azure ML Studio

Generally, the work flow is split into several experiments. All the temporary results are stored on the virtual machine as datasets.

Step 1

Data preparation

Step 2

Text Preprocessing

Step 3

Feature Extraction

Step 4

Train and Evaluate model

Figure 4 – Example of Evaluation module output

Step 5

Deploy Web service

Azure ML + Power BI

The possible architecture is the following:

Figure 5 – Workflow for visualizing the Azure ML Predictive model result (source https://powerbi.microsoft.com/es-es/blog/power-bi-azure-ml/)

The steps are as follows:

Create a predictive model in Azure ML
Publish this model as a web service
Use R to extract data out of storage that has not yet been scored by your ML model (because Power BI supports R scripting for data extraction)
Use R to call the Azure ML web service and send it the unscored data (with the set of parameters previously defined as the web service input)
Write the output of the Azure ML model back into SQL
Use R to read scored data into Power BI
Publish the Power BI file to the Power BI service (it will allow your user see the report in the cloud)
Schedule a refresh of the data using the Personal Gateway (which triggers a scheduled re-run of our R script and brings in new predictions)

Important: Automatic refresh may be scheduled at most once per half an hour. However, there is no limit for manual refresh.

There is another solution, which does not involve sql connectivity. You may add R script directly in your Power BI Report and connect this new visual with your existing KPIs.