How to setup Databricks job using credentials passthrough on Azure - azure-active-directory

I want to stop using mounts and tighten up security and reduce cost using job clusters.
I want to use passthrough credentials. I've created a notebook that reads avro and converts it to parquet. It works when I run the notebook; however, job fails when it runs as service principal or my account. Service principal is blob contributor. I'm not able to run working notebook as a job.
I've tried using job cluster and existing high concurrency cluster and I get error below in both cases:
Error message:
Py4JJavaError: An error occurred while calling o498.load.
: com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token

AAD credentials passthrough doesn't work for jobs, especially for jobs owned by service principals. AAD passthrough relies on capturing the user's AAD token and forwarding it to ADLS...
But if you're already using the service principal, why not configure the job for direct access to ADLS as it's described in the documentation? You just need to set Spark conf properties in the job cluster (replace values in <> with actual values):
fs.azure.account.auth.type OAuth
fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<azure-tenant-id>/oauth2/token
fs.azure.account.oauth2.client.id <sp-client-id>
fs.azure.account.oauth2.client.secret {{secrets/<secret-scope>/<secret-key>}}

Related

Can we use Graph API delegated permission for Azure Data Factory?

I am trying to get "/groups" endpoint of Microsoft Graph API in my tenants via Azure Data Factory. I have given "Delegated permission" for my service principal. To my knowledge, when there is no user to act on behalf of, we should use "Application permission".
However, organization requirement does not allow me to use Application permission.
Therefore, when I try to execute my pipeline, I get "insufficient privileges to complete the operation."
Can this be the reason because ADF does not allow to use delegated permissions as there is no user to act on behalf of?
I tried to reproduce the same in my environment to get groups endpoint using graph api from ADF
And got same error:
Insufficient privileges to complete the operation
Make sure you have Microsoft graph permissions to query groups and users.
The user here do not has Data Factory contributor role .
Or make sure you ADF has proper access to the resources .So give it contributor role
And if your are using storage account to store the rest response, make sure user , app or group has Storage blob data contributor role.
Reference:
Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse | Microsoft Learn

Azure Devops, Conditional Access Policies prevent pipelines from running

At the moment I have setup a build pipeline that pulls an artifact from Azure artifacts.
Authentication is done using a Personal Access Token.
Since a couple of days now, my pipeline errors out with the message:
VS403463: The conditional access policy defined by your Azure Active Directory administrator has failed.
Looking into the debug logs I can see the call that is made
Now doing this call from my local machine, this works but ONLY if I am within the network of my Organisation (if I run it from home, it does not work)
Looking at the pipeline, it mentions that it gets an agent "from the cloud".
I assume this agent is not within our network.
Is there any way we can setup Azure Devops such that we can still access Azure artifacts from a cloud build agent whilst this conditional access policy is in place?
Your administrator might set the Conditional Access policies to block the ip outside the trusted IP range to access your organization resources. Check the Common signals Conditional Access takes.
You can try connecting to your organization network over VPN when you work from home. Or asking your administrator to add your home ip to the trusted ip range.
You can also try disabling "Enable Azure Active Directory Conditional Access Policy Validation" in your azure devops organization setting page. Check the steps here.
Check here to learn more about conditional access policy. Hope you find it helpful.

Why "Data credentials" can't be blank for community connector when using service account to access underlying bigquery data?

I built a community connector that uses service account to access the bigquery service and it works fine. However, when I was looking at Service.getEffectiveUser() I noticed that this always resolved to my user even when accessing the published report without any session. When I set the "Data credentials" from "Owner" to "Viewer" it asks to login. However, I plan to use a separate token based authentication by passing the token in the connector url parameter. So, is there a way to execute the community connector script without setting Data credentials to Owner or Viewer? Note that I already return AuthTypes.NONE for getAuthType.
Note that this community connector will not be published and will only be used for a SAAS application where the report will be embedded and accessible to the users of the SAAS application.

How to automatically authenticate with Hadoop using Active Directory?

We have an application that accesses Hadoop via HDFS, YARN, and Hive interfaces. This application works fine against Kerberos-secured clusters if kinit has been run. It also works fine if we call UserGroupInformation.loginUserFromKeytab(). We are able to delegate the HDFS and Hive tokens to YARN applications. The thing we cannot figure out is the following scenario:
The Hadoop cluster is secured using Kerberos
The Hadoop cluster either uses Active Directory as its KDC, or has
established a one-way trust between its KDC and the AD controller.
Our software is running in a session that has been authenticated
using AD directly on Windows, or via PAM or LDAP (or some other mechanism) on
Linux.
Our software queries the active AD session to extract a TGT or
equivalent, and relays that information to the Hadoop APIs (via
UserGroupInformation, presumably).
Hadoop authentication is thus achieved without the need for the user
to enter a principal, password, or keytab.
We know this is possible in theory, because there are two examples of software that achieve this. The first is HDFS Explorer from RedGate. The second is Hue. However, we just can't seem to figure out the right incantation, and even Hortonworks support can't seem to help.
Hue comes with a LDAP backend that can transparently authenticate users against your company directory,
Hue also comes with a KT renewer command for keeping its Kerberos ticket up to date. It is even ran automatically when using CM.

Web service connection to SQL Server with AD account

I have a WCF web service that should always use a specific AD account, which has been granted access to the database, to execute SQL transactions. I read a couple of articles, but I'm obviously doing/understanding something wrong because I'm not getting it to work the way I want.
I figured that that web service should impersonate the AD user, so I enabled impersonation in the web service web.config:
<identity userName="dmn\wsusr" password="p#55w0rd" impersonate="true"/>
Then, since I'm technically using a Windows user to connect to SQL, I set the connection string as follows ("Integrated security=true;" for Windows authentication, right?):
Data Source=SQLSVR\INSTNC; Failover Partner=SQLSVR\INSTNC2; Initial Catalog=DB; Integrated Security=true;
For testing the connection I insert some values into a table. One of the columns of the table I'm inserting to has the following definition:
[LogUser] VARCHAR(75) NOT NULL DEFAULT USER
So, theoretically, the AD username of the user who opened the connection will automatically be inserted into the column. Unfortunately, however, the column contains my own AD username every time.
I'm testing the web service via a web site that uses Windows authentication, so I'm assuming that this plays a role in the cause of the problem. But the website authentication should be disregarded since this will be an externally accessible web service and SQL transactions should never rely on authentication between the client and the web service.
Thanks!
EDIT
I also tried:
Adding Trusted_connection to the connection string, but it yielded the same result as above.
Using User ID and Password in the connection string, but since the connection string only accepts SQL users, this resulted in a Login failure error
EDIT2
I suggested to my superiors that we should try the approach where you create a separate application pool for the service, set it up to run as the AD user, and allow the AD user to log on as a service (something I read somewhere) but they're not keen on that and reckon it should be a "last resort"

Resources