SageMaker fails to create sagemaker_data_wrangler database cause Lake Formation permissions - amazon-sagemaker

The Glue DataCatalog access is managed by Lake Formation. But when trying to add a new SageMaker Data Wrangler flow that queries an Athena table, it gives the following error:
CustomerError: An error occurred when trying to create
sagemaker_data_wrangler database in the Glue data catalog: An error
occurred (AccessDeniedException) when calling the CreateDatabase
operation: Insufficient Lake Formation permission(s): Required Create
Database on Catalog
The database sagemaker_data_wrangler does not exist, but we have add the default S3 bucket that uses SageMaker (sagemaker-{region}-{account}), to Lake Formation Data Location, in order to give the SageMaker execution role CreateDatabase privileges:
The error persists even if we manually create the database (sagemaker_data_wrangler) and give privileges to the Data Wrangler execution role.

Solved.
It was missing the CreateDatabase privilege to the Catalog, which must be given through AWS CLI cause the AWS Console does not give the option:
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::10223*******:role/sandbox-******-mlops-demo-sagemaker --permissions "CREATE_DATABASE" --resource '{ "Catalog": {}}'

Related

Cannot find the CREDENTIAL ... because it does not exist or you do not have permission

I follow the instructions on https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-configure-s3-compatible?view=sql-server-ver16 to access files on an S3 bucket (running on MinIo) from SQL Server 2022. However I run into the error message "Cannot find the CREDENTIAL ... because it does not exist or you do not have permission"
I assume it is related due to the fact that it's a database scoped credential, which is stored on a different table as the normal credentials. However the documentation doesn't give me a hint how to address this.

How to setup Databricks job using credentials passthrough on Azure

I want to stop using mounts and tighten up security and reduce cost using job clusters.
I want to use passthrough credentials. I've created a notebook that reads avro and converts it to parquet. It works when I run the notebook; however, job fails when it runs as service principal or my account. Service principal is blob contributor. I'm not able to run working notebook as a job.
I've tried using job cluster and existing high concurrency cluster and I get error below in both cases:
Error message:
Py4JJavaError: An error occurred while calling o498.load.
: com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
AAD credentials passthrough doesn't work for jobs, especially for jobs owned by service principals. AAD passthrough relies on capturing the user's AAD token and forwarding it to ADLS...
But if you're already using the service principal, why not configure the job for direct access to ADLS as it's described in the documentation? You just need to set Spark conf properties in the job cluster (replace values in <> with actual values):
fs.azure.account.auth.type OAuth
fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<azure-tenant-id>/oauth2/token
fs.azure.account.oauth2.client.id <sp-client-id>
fs.azure.account.oauth2.client.secret {{secrets/<secret-scope>/<secret-key>}}

Automate AWS RDS User Management

Most of you would have encountered the problem of creating db users for developers across multiple database (using common user is not allowed). We have around 90 DB's on AWS and 200-250 dev's. Everyday someone needs access to a database and this is manual and repetitive task.
I am looking for a solution to automate end-to-end lifecycle of user management, scripting or creating a terraform module are solutions which I already have in my mind, but how does other organization manage DB users at scale ?
I did look at AWS IAM authentication but I am not sure how can we grant fine grain access using IAM roles.
Cheers,
Fun Learn
The way I've done this is (high level);
Create your RDS Terraform Config / Module(s)
Create a sql file with the user & grant creations needed
Create a wrapper script that deploys terraform then connects to it to deploy your SQL file with user creation
Your wrapper script will need to use Terraform Outputs to get your newly created RDS Endpoint to connect to | Say you created an output called rds_endpoint in your terraform plan / config... This is how you grab it in bash terraform output rds_endpoint
Assuming your new RDS DB is not publicly accessible, your wrapper script will need to tunnel in through a bastion or some other instance that is publicly accessible with access to the DB. Example: ssh -oStrictHostKeyChecking=no -p 22 -i ~/.ssh/bastion-host-key.pem -C -N ec2-user#$bastion_ip -L 3306:$rds_endpoint:3306 &
Your wrapper script will need to use the RDS user & password you created with terraform as well to run the SQL File
In fact IAM authentication could be the key to do that.
What you can do is in fact create all you databases with terraform.
do not forget to enable iam authentication via your terraform module.
Once all you databases are created via teraform, you have to create local role(s) in all of theses databases (either via terraform using SQL script or still via terraform using modules that allow you to create user/roles, for postgresql you can use this module ) and you have to grant them the pre-created, existing, database role for iam (for example with postgresql its named "rds_iam")
The thing that is interresting with iam authentication is that all of your developper can connect using their account to aws and request a token that will be used as a password (the username will be the role you created before) and by doing this you create only one role, but each authentication is made by each developpers account.
If your company really needs you to create roles for each devs (even if the roles are exactly the same, It makes no sense since by definition, we ASSUME a role, so anyone can assume ONE role, this is not awful) you can then create a local database users (instead of a role) for all of your developpers in all of your database by using an SQL script that your terraform will execute.
Of course do not forget to grant the rds_iam existing role to either the unique role that will be used by all the developpers (in case you choose this solution) or to all the db users you created before.
You will have to manage IAM policy for all of theses users to be accurate regarding to the security (or use * in the policy to let all your developpers connect to all you db users lol)
and then your developpers will be able to use aws rds command to generate an auth token and connect to their local db user that will have to correct rights.
There is a hole bunch of informations and precisions here:
https://aws.amazon.com/premiumsupport/knowledge-center/users-connect-rds-iam
have a nice journey on aws

Why do I get warehouse error when trying to read from Snowflake table?

When trying to read from a table with Snowflake Python connector I am getting the following error:
*** snowflake.connector.errors.ProgrammingError: 000606 (57P03): No active warehouse selected in the current session. Select an active
warehouse with the 'use warehouse' command.
Searching the web for solutions I saw that the main recommendation is to apply USE WAREHOUSE <warehouse name>
However, when I apply this command I get the following error:
*** snowflake.connector.errors.ProgrammingError: 002043 (02000): SQL compilation error: Object does not exist, or operation cannot be
performed.
I also granted "USAGE" privileges to the relevant user, but the errors still occurred.
There are no errors when I apply the commands USE DATABASE and USE SCHEMA on the other hand.
Also, I am able to read from the table from Snowflake web UI with another user.
Any idea what might be wrong?
so use SHOW WAREHOUSES
show warehouses;
name
state
type
size
running
queued
is_default
is_current
auto_suspend
auto_resume
COMPUTE_WH
SUSPENDED
STANDARD
X-Small
0
0
Y
Y
600
true
The name has to be one you can use. And it needs to be AUTO RESUME or you need it STARTED or to START IT
ALTER WAREHOUSE <name> RESUME IF SUSPENDED;
It's nice to have DEFAULT_WAREHOUSE set on your user also, so you don't have set it each time
ALTER USER <user_name> SET DEFAULT_WAREHOUSE = <warehouse_name>;
So I've finally managed to figure out what was the problem:
At first I solved it by granting 'USAGE' privileges to 'PUBLIC'. However, this was not making sense to me because I configured a different user than 'PUBLIC' to my Snowflake Python Connector. The user I am using was assigned a different role.
Then, I figured that I did not configure a role to my Snowflake Python Connector. After I configured the relevant role to the Python connector, granting 'USAGE' privileges to that role solved the problem.
I suppose that when no role is configured to the Python Connector, 'PUBLIC' role is used by default.
Since I'm just starting off with Snowflake and this is all based on my trial and error experiments, there may be some more details/info that can be shared by anyone who is more experienced in Snowflake configuration. So that could be helpful for others who experience similar problems.

Permissions issue trying to create an external data source on Azure SQL Database

Please bear with me as I am trying to learn Azure. I have in my resource group a SQL Server database, and a blob storage account with a container. I am the owner of these resources.
I am trying to create an external data source on my SQL database to link to my blob storage account, but I am running into a permissions issue that I cannot seem to resolve. Running the query:
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://[redacted].blob.core.windows.net/'
);
Returns this error message:
Msg 15247, Level 16, State 1, Line 1
User does not have permission to perform this action.
My Google-fu seems to be betraying me, as I can't seem to find any references to this issue. Am I missing something basic? I'm browsing through my Azure Dashboard but I can't find any obvious way to manage specific database permissions, although I would have assumed that given that I am the owner, I had maximum possible permissions?
Please provide the credential as shown below:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'some strong password';
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2015-12-11&ss=b&srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z&spr=https&sig=copyFromAzurePortal';
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://myazureblobstorage.blob.core.windows.net',
CREDENTIAL= MyAzureBlobStorageCredential);
I was having the same error when trying to create an EXTERNAL DATA SOURCE. What worked for me was add the grant CONTROL for the database user:
GRANT CONTROL to your_db_user

Resources