There are plenty of usage scenarios when a data scientist wants his scripts (written in Python or R) to be executed automatically. One may propose using Azure Functions, but the problem is that it may be quite limiting to install a data scientist’s virtual environment inside the function’s environment. For instance, your script depends on python joblib package which in turn is dependent on C++ compiler. In this case, it may get complicated to set your script up and running using Azure Functions. Moreover, another issue is related to the data source connection as you need to provide a connection string and thus make your storage accessible from cloud. And what if the storage is on-premise and can be accessed only from the local network or via a VPN connection? Of course, there is a workaround using Azure VPN, but what if a developer doesn’t want to rework his code that is already working perfectly? Moreover, one may be interested in having a simple environment allowing to define a script as a SSIS-like workflow. Yet there may any additional reason when a user is not willing to use Azure Functions for script automation.
The main goal of this article is not to prove/disprove efficiency of Azure Functions but to show a reader another way of doing automatic execution and if someone finds it helpful, we will be able to proudly claim that we made development a little bit better and easier.
Create an empty experiment in Azure ML studio and put your script inside
Create a storage for your input data (SQL server or a simple file in a blob storage)
Create a storage for your output data. This is needed to connect your Power BI report. There is a way to put your script output directly in Power BI (e.g. using ‘Convert to CSV’ and ‘Generate Data Access Code’), but in this case you will need to install a data gateway on your PC thus making it impossible to refresh your dashboard when your working PC is off.
Add input and output connectors to your experiment
Add a sample web input as a second dataframe to your script component. Here is the trick: this dataframe will never be used and is only needed to deploy your experiment as a web service. It makes sense to define a simple data structure as a digit or a character.
After all the manipulations your experiment should look like this:
The only thing left is to deploy your experiment as a web service. Your newly created web service will execute all the components (including your scripts) in Azure Environment and will send back an execution status : 200 if everything is alright and so on and so forth.
Now go to Power BI and create your report. Use the script output as a data source.
Here is another trick: add a new data source as a web input. This is needed to launch your scripts automatically. Just go to ‘Get Data’-> Blank Query
In your direct query just paste the following code (I’ve put my api endpoint to make it clear for a reader where to put which part of code):
(text) => let
apikey = "your-key",
endpoint = "https://ussouthcentral.services.azureml.net/workspaces/7b1591095a27421eb716eca8aa6a9f96/services/36b645dcbb7942bbbb4e66597f7afb3b/execute",
jsontext = Text.FromBinary(Json.FromValue(Text.Start(Text.Trim(text), 5000))),
jsonbody = "
{
""Inputs"": {
""input1"": {
""ColumnNames"": [
""num_of_tags""
],
""Values"": [
[
""0""
]
]
}
},
""GlobalParameters"": {}
}
",
bytesbody = Text.ToBinary(jsonbody),
headers = [#"Authorization"= Text.Combine({"Bearer", apikey}, " "),
#"Content-Type"="application/json"],
bytesresp = Web.Contents(endpoint, [Headers=headers, Content=bytesbody, Query=[#"api-version"="2.0", details="true"]]),
jsonresp = Json.Document(bytesresp),
weboutput = jsonresp[Results]
in weboutput
Again, the output doesn’t really matter, we just need a trigger to launch our script directly from power bi
Awesome, we only need to add a sample table (containing a digit in my case) that we will apply our query on or simple invoke this function. So invoke your function and apply changes.
So, whenever you invoke your function, it will launch your web service, which, in turn, will invoke your script. Simple and elegant (I suppose).
Important! There may be some difficulties in publishing your dashboard to web as PowerBI first checks your endpoint and if there is an error it will not authorize a refresh (Data credentials error). This is a real issue I am working on, and I will keep you updated. The error comes from the http query method sent by Power BI web services. The default method is GET whereas AML Studio web service supports only POST query. Hopefully, there is a workaround in setting default method to POST using Power BI settings, but I should give it a try before. In a meanwhile you can use Azure Functions. Yes, we are coming back to the very beginning of the article but now the logic changes. You don’t really need to develop, just copy a sample code generated by AML Studio and put it inside the function and schedule a refresh to a desired frequency.
Comments