Azure Databricks: Credential Passthrough with R
Updated: Nov 19, 2021
In this short article we are going to discuss how to organize a Role-based access using Azure Active Directory Passthrough. No deep stuff, up we go!
I've recently tried implementing different patterns to secure the access to ADLS Gen 2 from Azure Databricks. There's an excellent reference explaining how to set them up and running.
I've been particularly interested the pattern 3, AAD Credential passthrough. This approach is interesting, as the user's credentials are passed through to ADLS gen2 and evaluated against the files and folder ACLs. This feature is enabled at the cluster level under the advanced options.
Here are the steps I went through
1) Create an ADLS Gen 2 (in fact, it's simply an Azure Storage Account with hierarchical namespace enabled). Here's the reference.
2) Define Roles. I've created different containers and different user accounts in my Azure Tenant. Then, at the container level, just click on Roles -> Add Role Assignment -> Azure Blob Data Contributor. Important, the role should be at least Azure Blob Data Contributor, otherwise it will not work out.
3) Enable Credential Passthrough. I've got multiple users, so I will create a high-concurrency cluster. Also, I need to be able to run scripts on my newly created cluster. This is the trickiest point. If we have a look on the documentation, it says :
When you create a cluster, set the Cluster Mode to High Concurrency.
Under Advanced Options, select Enable credential passthrough and only allow Python and SQL commands.
So it explicitly says, that R is not allowed. And if I try running the command, here's what I see:
Your administrator has only allowed python and sql commands on this cluster. This execution contained at least one disallowed language.
But where there's a will, there's a way. Instead of checking this option, we can customize the cluster configuration ourselves.
To do so, go to Edit -> Advanced Options -> Spark Config
spark.databricks.cluster.profile serverless spark.databricks.repl.allowedLanguages python,sql,r spark.databricks.delta.preview.enabled true spark.databricks.passthrough.enabled true spark.databricks.pyspark.enableProcessIsolation true
Thus, I've my cluster with Credential Passthrough enabled, and supporting R code execution.
No we can run:
# read data in delta format using direct path readdf = spark.read .format("<file format>") .load("abfss://<filesys>@<storageacc>.dfs.core.windows.net/<path>")
And in another cell
%r temp <-2
Works like a charm.
Even if it seems prety easy (and it actually is!), I've spent the whole morning in configuring the cluster (still a Rookie developer), so if it saves you a couple of hours, I will be completely satisfied. Enjoy!