top of page
Writer's pictureAlibek Jakupov

Azure Databricks: Credential Passthrough with R

Updated: Nov 19, 2021


In this short article we are going to discuss how to organize a Role-based access using Azure Active Directory Passthrough. No deep stuff, up we go!


 

I've recently tried implementing different patterns to secure the access to ADLS Gen 2 from Azure Databricks. There's an excellent reference explaining how to set them up and running.


I've been particularly interested the pattern 3, AAD Credential passthrough. This approach is interesting, as the user's credentials are passed through to ADLS gen2 and evaluated against the files and folder ACLs. This feature is enabled at the cluster level under the advanced options.


Here are the steps I went through


1) Create an ADLS Gen 2 (in fact, it's simply an Azure Storage Account with hierarchical namespace enabled). Here's the reference.



2) Define Roles. I've created different containers and different user accounts in my Azure Tenant. Then, at the container level, just click on Roles -> Add Role Assignment -> Azure Blob Data Contributor. Important, the role should be at least Azure Blob Data Contributor, otherwise it will not work out.



3) Enable Credential Passthrough. I've got multiple users, so I will create a high-concurrency cluster. Also, I need to be able to run scripts on my newly created cluster. This is the trickiest point. If we have a look on the documentation, it says :

  1. When you create a cluster, set the Cluster Mode to High Concurrency.

  2. Under Advanced Options, select Enable credential passthrough and only allow Python and SQL commands.



So it explicitly says, that R is not allowed. And if I try running the command, here's what I see:

Your administrator has only allowed python and sql commands on this cluster. This execution contained at least one disallowed language.

But where there's a will, there's a way. Instead of checking this option, we can customize the cluster configuration ourselves.


To do so, go to Edit -> Advanced Options -> Spark Config

spark.databricks.cluster.profile serverless
spark.databricks.repl.allowedLanguages python,sql,r
spark.databricks.delta.preview.enabled true
spark.databricks.passthrough.enabled true
spark.databricks.pyspark.enableProcessIsolation true

Thus, I've my cluster with Credential Passthrough enabled, and supporting R code execution.


No we can run:


# read data in delta format using direct path
readdf = spark.read
.format("<file format>")
.load("abfss://<filesys>@<storageacc>.dfs.core.windows.net/<path>")

And in another cell


%r
temp <-2

Works like a charm.

 

Even if it seems prety easy (and it actually is!), I've spent the whole morning in configuring the cluster (still a Rookie developer), so if it saves you a couple of hours, I will be completely satisfied. Enjoy!


1,338 views5 comments

Recent Posts

See All

5 comentários


Pankaj Jadhav
Pankaj Jadhav
18 de jan.

when i try to edit cluster config it is saying passthrough can be enabled only through custom access mode. I don't see custom access mode in while creating cluster.


Do we need this passthrough enable while mounting adls gen2 in databricks using service principal?

Curtir

deepakdeedev
16 de fev. de 2022

Hi Alibek, Thank you very much for sharing this! My admin tried something similar but wasn't able to get R to work on the cluster with both High Concurrency and credential passthrough enabled. Wanted to understand if this still works for you! Many Thanks!

Curtir
ajakupov
17 de fev. de 2022
Respondendo a

Oh, that's great! I am glad you found it useful!

Curtir
bottom of page