Tools available on Azure:
Azure ML, Spark on HDInsight, Azure Databricks, Batch AI, SQL Server Machine Learning Services, Data Science VM, Azure Notebooks, ML Server
Important:
Azure Databricks is the same as Spark
Batch AI is adapted for Big Data but different from Spark as uses very large clusters
SQLServer ML Services allow encapsulating ML in TSQL (for enterprise developers and mainly for data preparation).
Azure Notebooks is good but not for product development
ML Server: R server or Python Server
Data Science Virtual Machine - Ready to go client environment for Data Science. VM image pre-installed and configured with data science tools
Usage modes
3 modes available:
Pre-built AI: models are pre-trained using Microsoft data (e.g. sentiment analysis, keyword extraction, topic detection etc.) aka Cognitive Services
Customized AI: pre-built models customized with user data
Custom models: fully tailored for user scenario and trained with user data
N.B. Cosmos DB is adapted for web scale applications (huge amounts of data). ML Services are used for semantic leveling. Data factory allows running SSIS packages. In case you are using Azure Databricks (Spark ML) you may schedule it as a job.
Important: ML scripting on SQL Server is no longer available.
Agile process for Data Science team
Microsoft Team Data Science process is perfectly adapted to develop, deploy and manage AI applications in the cloud based on DevOps best practices.
The data science lifecycle in TDSP is shown in figure 1
Thus, primary stages are: Business Understanding; Data Acquisition and Understanding; Modeling; Deployment.
After finishing the experiment, you may use your trained model for:
Device application
Serialized model
Desktop application
Serialized model
Containerized model
Web server application
Containerized model
Comparative Analysis
In this section we compare Azure ML Workbench and Azure ML Studio from the point of view of hackathon participant.
Azure ML Studio
Ideal for fast deployment of a web service. All the data science processes run on the virtual machine. With free subscription the developer has up to 10 Gigabyte of storage.
Pros: Easy to use; Fast web service deployment; All the development is done on the cloud – nothing to install; many algorithms are pre-developed and ready to use; ability to run R and Python Scripts; ability to add libraries that are not pre-installed.
Cons: Limited from the algorithmic point of view (if not using custom models); Limited visualization capacity; Not scalable
Azure ML Workbench
Installed on the local machine and allowing to use certain components locally (e.g. Data Preparation) or deploy to a target cluster and cloud as a runtime or execution environment. After training you may Deploy One Docker Container with one scoring Web Service per Model.
Pros: Scalable; Local-based; Rich from the algorithmic point of view; Git integration; Access control; Project roaming and sharing; Ability to run history information
Cons: Needs Azure ML Experimentation Service account to run the experiment (even locally); Needs Workbench installation (may cause difficulties for Linux users as no Linux installation available); Only real-time (request-response) mode for web service available, no batch scoring
Process of training and evaluating a predictive model
Azure ML Studio
All the experiments are organized in the form of data flow (similar to SSIS).
Source
Azure ML Studio allows loading data sets directly on the virtual machine or connect to the distant storage via the import data wizard. Currently the supported options are: Azure Blob storage, Web url via HTTP, Hive Query, Azure SQL Database, Azure table, Data Feed Provider, On-premises SQL Database (Preview feature), Azure DocumentDB.
Data Preparation
There are certain components that are already pre-built to perform data preparation and analysis. If the code already exists on the local storage (Python or R), it may be copied into the special scripting component. The input/output are only in the form of a dataframe.
Important: the module structure is special, so it is necessary to refactor the code as in Python scripting module all the logic is done via a single entry point (function main).
Data preprocessing
Special modules allow to preprocess data, clean noise, move missing values etc. At every point the output may be saved locally.
Training
For training you need to have special modules (it’s possible to train model in python or R modules, but to save it after training you need use training ones). The following train modules are available: Sweep Clustering, Train Anomaly Detection Model, Train Clustering Model, Train Matchbox
Recommender, Train model, Tune Model Hyperparameters (formerly known as sweep parameters).
Evaluation
Before evaluating it is necessary to score the model. The scoring process is the same as usual – we split data into test and evaluation and obtain the overall accuracy. Modules adapted for evaluation are: Cross Validate Model, Evaluate Model, Evaluate Recommender, Evaluate Probability function.
Web service deployment
After saving the trained model, we create an experiment with web service input and output (it is possible to omit one of them but not both). The experiment will only be aimed to define the input and output structure. You then may deploy your model as a web service.
Important: the output is in the Json format (usually it is a prediction score). The batch execution is also supported.
Azure ML Workbench
Prepare data
Data Sources
Files (CSV, TSV, JSON, Fixed Width)
Parquet
Excel file
Database (Azure SQL DB or SQL Server)
Utilizes Program Synthesis using Examples (PROSE) to support various kinds of data wrangling, where data transformations are performed by example instead of explicit code (e.g., regular expressions). Extend/customize transform and featurize through Python. Generate Python and PySpark for execution at scale. Export the data prep data flow as a file, re-use for later training and for scoring. Explore using notebooks.
Train and Test Model
Firstly, you need to load the prepared dataset and split it into test and train subsets. Then choose algorithm and train the model. The training is done using Python script.
Configure Compute Targets
Compute targets provide the resources that run your experiment. First, attach compute target. Then, prepare compute target and reference compute target. When submitting scripts via ML CLI provide the compute target name. When in a Jupyter Notebook, select the Kernel named after the compute target
For remote compute targets there are several possibilities
Attach a DSVM as a compute target
az ml computetarget attach remotedocker --name <compute target name> --address <ip or FQDN> --username <admin user> --password <admin password>
Attach an HDInsight cluster as a compute target
az ml computetarget attach cluster --name <compute target name> --address <name-ssh.azurehdinsight.net> --username <sshuser> --password <ssh password>
Prepare the compute target
az ml experiment prepare -c <compute target name>
Verify prepared status of the compute target
az ml experiment prepare -c <compute target name> --check
Evaluate
Measure model performance->Log the model performance->Chart the model performance
Deploy
Use the trained model to make predictions in:
Device application
Serialized model
Desktop application
Serialized model
Containerized model
Web server application
Containerized model
Deploy One Docker Container with one scoring Web Service per Model. Currently only real-time (request-response), no batch scoring. Scoring Web Service Script is implemented in Python with the following methods:
init() - loads the model from the file and registers the data collectors
run(input_dataframe) - prepares the input data, performs the scoring operation and logs both the inputs and prediction results with the data collectors
Text Analytics Example
Azure ML Studio
Generally, the work flow is split into several experiments. All the temporary results are stored on the virtual machine as datasets.
Step 1
Data preparation
Step 2
Text Preprocessing
Step 3
Feature Extraction
Step 4
Train and Evaluate model
Step 5
Deploy Web service
Azure ML + Power BI
The possible architecture is the following:
The steps are as follows:
Create a predictive model in Azure ML
Publish this model as a web service
Use R to extract data out of storage that has not yet been scored by your ML model (because Power BI supports R scripting for data extraction)
Use R to call the Azure ML web service and send it the unscored data (with the set of parameters previously defined as the web service input)
Write the output of the Azure ML model back into SQL
Use R to read scored data into Power BI
Publish the Power BI file to the Power BI service (it will allow your user see the report in the cloud)
Schedule a refresh of the data using the Personal Gateway (which triggers a scheduled re-run of our R script and brings in new predictions)
Important: Automatic refresh may be scheduled at most once per half an hour. However, there is no limit for manual refresh.
There is another solution, which does not involve sql connectivity. You may add R script directly in your Power BI Report and connect this new visual with your existing KPIs.
Price
Azure ML Studio
Machine Learning Studio is offered in two tiers—Free and Standard.
Features by tier are compared in the table below:
Azure Machine Learning enables you to deploy predictive analytics solutions as web services.
The deployed web services (new version) are subject to the following plans:
Azure ML Services
The below pricing reflects a preview discount.
Experimentation pricing
Model Management pricing
Yorumlar