Issue with ERRORFILE when BULK INSERTing from Azure Blob Storage - sql-server

I am trying to bulk insert a lot of CSV files from Azure Blob Storage into my Azure SQL database.
Here's how I am trying to achieve this:
IF EXISTS(SELECT * FROM SYSOBJECTS WHERE ID = OBJECT_ID('[sqldb1].[dbo].[TABLE_A_RAW]'))
DROP TABLE [sqldb1].[dbo].[TABLE_A_RAW];
CREATE TABLE [sqldb1].[dbo].[TABLE_A_RAW]
(
[COL1] varchar(60),
[COL2] varchar(60),
[COL3] varchar(60),
[COL4] varchar(60),
[COL5] varchar(60)
);
BULK INSERT [sqldb1].[dbo].[TABLE_A_RAW]
FROM 'TABLE_A.CSV'
WITH
(
DATA_SOURCE = 'myazureblobstoragecontainer',
FORMAT = 'CSV',
ERRORFILE = 'load_errors_TABLE_A',
ERRORFILE_DATA_SOURCE = 'myazureblobstoragecontainer',
FIRSTROW = 2,
FIELDTERMINATOR = '0xE29691',
ROWTERMINATOR = '0x0a'
)
GO
IF EXISTS(SELECT * FROM SYSOBJECTS WHERE ID = OBJECT_ID('[sqldb1].[dbo].[TABLE_B_RAW]'))
DROP TABLE [sqldb1].[dbo].[TABLE_B_RAW];
CREATE TABLE [sqldb1].[dbo].[TABLE_B_RAW]
(
[COL1] varchar(60),
[COL2] varchar(60),
[COL3] varchar(60),
[COL4] varchar(60),
[COL5] varchar(60),
[COL6] varchar(60),
[COL7] varchar(60),
[COL8] varchar(60),
[COL9] varchar(60)
);
BULK INSERT [sqldb1].[dbo].[TABLE_B_RAW]
FROM 'TABLE_B.CSV'
WITH
(
DATA_SOURCE = 'myazureblobstoragecontainer',
FORMAT = 'CSV',
ERRORFILE = 'load_errors_TABLE_B',
ERRORFILE_DATA_SOURCE = 'myazureblobstoragecontainer',
FIRSTROW = 2,
FIELDTERMINATOR = '0xE29691',
ROWTERMINATOR = '0x0a'
)
GO
The code above was developed when I worked on an almost identical project (with identical deployment) and it worked without any issues. When I tried to run the code above for my current project, the error log files get created and so do the tables (as expected) but they are all empty and I get these errors:
Msg 4861, Level 16, State 1, Line 17
Cannot bulk load because the file "load_errors_TABLE_A" could not be opened. Operating system error code 80(The file exists.).
Msg 4861, Level 16, State 1, Line 17
Cannot bulk load because the file "load_errors_TABLE_A.Error.Txt" could not be opened. Operating system error code 80(The file exists.).
Msg 4861, Level 16, State 1, Line 50
Cannot bulk load because the file "load_errors_TABLE_B" could not be opened. Operating system error code 80(The file exists.).
Msg 4861, Level 16, State 1, Line 50
Cannot bulk load because the file "load_errors_TABLE_B.Error.Txt" could not be opened. Operating system error code 80(The file exists.).
The error files are only created when I run the code above, meaning that they don't exist prior to running the code above as the error messages seem to indicate. When I comment-out the lines that say ERRORFILE and ERRORFILE_DATA_SOURCE (i.e.ERRORFILE = 'load_errors_TABLE_A',, ERRORFILE = 'load_errors_TABLE_B',, and ERRORFILE_DATA_SOURCE = 'myazureblobstoragecontainer',) and run the script again, then the bulk insert finishes without any errors (but the error files don't end up being created, obviously).
I want to BULK INSERT WITH ERRORFILEs so that I can track any truncations that occur during the operation, like I did in my previous project. I tried looking for similar posts but they all seem to mostly relate to local BULK INSERT operations where the error log files is also created/stored locally. The deployment for the previous project and this one are almost identical, as I mentioned above - They are both running SQL Server 2014 (12.0.2000.8) and I have read/write access to both the Azure DB and Blob Storage account + container.

The culprit ended up being the permissions, as #joseph-xu suggested in his answer below.
Current Project:
Old Project:
The SAS key for the Blob Storage I was using for this project was missing DELETE and DELETE VERSION permissions, which is necessary if you want to include ERRORFILE and ERRORFILE_DATA_SOURCE in your BULK INSERT statement. As far as I am aware, this is not mentioned in Microsoft's documentation (and the error message doesn't hint at this being the issue either).
I simply created a new SAS key with ALL permissions, used that to create a new DATABASE SCOPED CREDNETIAL and EXTERNAL DATA SOURCE, and ran my code again and it worked.

Your SAS key has not expired, right? And please check the Allowed permissions.
Did you delete the question mark when you create the SECRET?
CREATE DATABASE SCOPED CREDENTIAL UploadInvoices
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2019-12-12******2FspTCY%3D'
I've tried the following test, it works well. My csv file has no header.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '***';
go
CREATE DATABASE SCOPED CREDENTIAL UploadInvoices
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2019-12-12&ss=bfqt&srt=sco&sp******%2FspTCY%3D'; -- dl
CREATE EXTERNAL DATA SOURCE MyAzureInvoices
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://***.blob.core.windows.net/<container_name>',
CREDENTIAL = UploadInvoices
);
BULK INSERT production.customer
FROM 'bs140513_032310-demo.csv'
WITH
(
DATA_SOURCE = 'MyAzureInvoices',
FORMAT = 'CSV',
ERRORFILE = 'load_errors_TABLE_B',
ERRORFILE_DATA_SOURCE = 'MyAzureInvoices',
FIRSTROW = 2
)
GO

Related

Getting this error : 'The Remote Java Bridge has not been attached yet.' while connecting to hdfs from external table in sql server

I tried to create a external table in sql server pointing to hdfs, but getting the below error
Msg 110813, Level 16, State 1, Line 16
105019;External file access failed due to internal error: 'The Remote Java Bridge has not been attached yet.'
enter image description here
I have configured Hadoop and SLQ Server on Ubuntu-20.04 and installed polybase as well.
-> SQL Server version - 2019
-> hadoop-3.3.0
Below are the queries I have executed.
CREATE EXTERNAL DATA SOURCE [HadoopDFS1]
WITH (
TYPE = Hadoop,
LOCATION = N'hdfs://localhost:9000'
)
CREATE EXTERNAL FILE FORMAT CSVFF WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR =',',
USE_TYPE_DEFAULT = TRUE));
CREATE EXTERNAL TABLE [dbo].[Salary] (
[Company Name] nvarchar(200),
[Job Title] nvarchar(100),
[Salaries Reported] int,
[Location] nvarchar(50)
)
WITH (LOCATION='/Data/input/',
DATA_SOURCE = HadoopDFS,
FILE_FORMAT = CSVFF
);

CETAS in SQL Server 2022

I have followed the below steps in SQL Server 2022:
Step 1: Create a master key
Step 2:
CREATE DATABASE SCOPED CREDENTIAL [BlobSAS]
WITH
IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = '?sv=2021-06-08&ss=bfqt&srt=c<<>>';
Step 3:
CREATE EXTERNAL DATA SOURCE [BlobSource]
WITH (
LOCATION = 'abs://<<>>.blob.core.windows.net/dummy',
CREDENTIAL = [BlobSAS],
);
Step 4:
CREATE EXTERNAL FILE FORMAT [CommaDelimited] WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = N',',
STRING_DELIMITER = N'"',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = True
)
)
Step 5:
CREATE EXTERNAL TABLE Demo WITH (
LOCATION = '/Test',
DATA_SOURCE = [BlobSource],
FILE_FORMAT = [CommaDelimited]
)
AS
SELECT
2 AS C1,
2 as C2
I am getting this error:
Access check for 'CREATE/WRITE' operation against
'abs://<<>>.blob.core.windows.net/dummy/Test' failed with HRESULT = '0x80070005'
Note: All permissions was provided while generating SAS
I also tried key route :
CREATE DATABASE SCOPED CREDENTIAL [BlobKey] WITH
IDENTITY = '<<>>',
SECRET = '<<>>';
CREATE EXTERNAL DATA SOURCE [BlobKeySource] WITH (
LOCATION = 'abs://<<>>.blob.core.windows.net/dummy',
CREDENTIAL = [BlobKey]
);
and when executing the below statement via sys admin account
CREATE EXTERNAL TABLE Demo WITH (
LOCATION = '/Test',
DATA_SOURCE = [BlobKeySource],
FILE_FORMAT = [CommaDelimited]
)
AS
SELECT
2 AS C1,
2 as C2
I'm getting the error:
Msg 15151, Level 16, State 1, Line 1
Cannot find the CREDENTIAL 'BlobKey', because it does not exist or you do not have permission.
I've just had loads of issues trying to do the same. Please your code does all of the following: (It looks like 2 & 3 are fine, but it may be a useful reference for others).
Shared access signature is generated for the container and not the storage account (Note this is different from how you would set it up in Azure Synapse SQL DW Pools)
No Type specified in the external data source (sometimes documentation states HADOOP or BLOB_STORAGE)
the location must be an abs:// link and not adls://.

Error while creating External File Format in Azure SQL Database

Getting the error while creating External File Format in Azure SQL DB
Incorrect syntax near 'EXTERNAL'.
I am using the following commands (Used the T-SQL syntax from Microsoft Docs Link - https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-ver15&tabs=delimited) but still getting the syntax error:
--Example 1
CREATE EXTERNAL FILE FORMAT textdelimited1
WITH ( FORMAT_TYPE = DELIMITEDTEXT
, FORMAT_OPTIONS ( FIELD_TERMINATOR = '|')
GO
--Example 2
CREATE EXTERNAL FILE FORMAT skipHeader_CSV
WITH (FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '"',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = True)
)
As #wBob mentioned, since External file format is not supported on Azure SQL DB and MI. We can use EXTERNAL DATA SOURCE. There are many reasons for this problem (Cannot bulk load because the ... could not be opened).
Check whether the SAS key has expired. And please check the Allowed permissions.
Did you delete the question mark when you create the SECRET?
CREATE DATABASE SCOPED CREDENTIAL UploadInvoices
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2019-12-12******2FspTCY%3D'
I've tried the following test, it works well.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '***';
go
CREATE DATABASE SCOPED CREDENTIAL UploadInvoices
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2019-12-12&ss=bfqt&srt=sco&sp******%2FspTCY%3D'; -- dl
CREATE EXTERNAL DATA SOURCE MyAzureInvoices
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://***.blob.core.windows.net/<container_name>',
CREDENTIAL = UploadInvoices
);
BULK INSERT production.customer
FROM 'bs140513_032310-demo.csv'
WITH
(
DATA_SOURCE = 'MyAzureInvoices',
FORMAT = 'CSV',
FIRSTROW = 2
)
GO

Azure SQL: Adding from Blob Not Recognizing Storage

I am trying to load data from a CSV file to a table in my Azure Database following the steps in https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15#f-importing-data-from-a-file-in-azure-blob-storage, using the Managed Identity option. When I run the query, I receive this error:
Failed to execute query. Error: Referenced external data source "adfst" not found.
This is the name of the container I created within my storage account. I have also tried using my storage account, with the same error. Reviewing https://learn.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-bulk-access-to-data-in-azure-blob-storage?view=sql-server-ver15 does not provide any further insight as to what may be causing the issue. My storage account does not have public (anonymous) access configured.
I'm assuming that I'm missing a simple item that would resolve this issue, but I can't figure out what it is. My SQL query is below, modified to not include content that should not be required.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**************';
GO
CREATE DATABASE SCOPED CREDENTIAL msi_cred WITH IDENTITY = '***********************';
CREATE EXTERNAL DATA SOURCE adfst
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://**********.blob.core.windows.net/adfst'
, CREDENTIAL= msi_cred
);
BULK INSERT [dbo].[Adventures]
FROM 'Startracker_scenarios.csv'
WITH (DATA_SOURCE = 'adfst');
If you want to use Managed Identity to access Azure Blob storage when you run BULK INSERT command. You need to enable Managed Identity for the SQL server. Otherwise, you will get the error Referenced external data source "***" not found. Besides, you also need to assign Storage Blob Data Contributor to the MSI. If you do not do that, you cannot access the CVS file storing in Azure blob
For example
Enable Managed Identity for the SQL server
Connect-AzAccount
#Enable MSI for SQL Server
Set-AzSqlServer -ResourceGroupName your-database-server-resourceGroup -ServerName your-SQL-servername -AssignIdentity
Assign role via Azure Portal
Under your storage account, navigate to Access Control (IAM), and select Add role assignment. Assign Storage Blob Data Contributor RBAC role to the server which you've registered with Azure Active Directory (AAD)
Test
a. Data
1,James,Smith,19750101
2,Meggie,Smith,19790122
3,Robert,Smith,20071101
4,Alex,Smith,20040202
b. script
CREATE TABLE CSVTest
(ID INT,
FirstName VARCHAR(40),
LastName VARCHAR(40),
BirthDate SMALLDATETIME)
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'YourStrongPassword1';
GO
--> Change to using Managed Identity instead of SAS key
CREATE DATABASE SCOPED CREDENTIAL msi_cred WITH IDENTITY = 'Managed Identity';
GO
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://jimtestdiag417.blob.core.windows.net/test'
, CREDENTIAL= msi_cred
);
GO
BULK INSERT CSVTest
FROM 'mydata.csv'
WITH (
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
DATA_SOURCE = 'MyAzureBlobStorage');
GO
select * from CSVTest;
GO

How to access data of one DB from another using Elastic Job?

I am trying to access data from one DB to another DB.For that I am using Elastic Job.Using Elastic Job I am to create table from one DB to another.But not able to access data or transfer data.I tried it using External Data source and External Table.
I used the below code :
External Data Source
CREATE EXTERNAL DATA SOURCE RemoteReferenceData
WITH
(
TYPE=RDBMS,
LOCATION='myserver',
DATABASE_NAME='dbname',
CREDENTIAL= JobRun
);
CREATE EXTERNAL TABLE [tablename] (
[Id] int null,
[Name] nvarchar(max) null
)
WITH (
DATA_SOURCE = RemoteReferenceData,
SCHEMA_NAME = N'dbo',
OBJECT_NAME = N'mytablename'
);
Getting error below:
> Error retrieving data from server.dbname. The underlying error
> message received was: 'The server principal "JobUser" is not able to
> access the database "dbname" under the current security context.
> Cannot open database "dbname" requested by the login. The login
> failed. Login failed for user 'JobUser'.
There are some errors in you statements:
the LOCATION value should be: LOCATION='[servername].database.windows.net'
Make sure when you create the CREDENTIAL: The "username" and "password" should be the username and password used to log in into the Customers database. Authentication using Azure Active Directory with elastic queries is not currently supported.
The whole T-SQL code example should be like this:
CREATE DATABASE SCOPED CREDENTIAL ElasticDBQueryCred
WITH IDENTITY = 'Username',
SECRET = 'Password';
CREATE EXTERNAL DATA SOURCE MyElasticDBQueryDataSrc WITH
(TYPE = RDBMS,
LOCATION = '[servername].database.windows.net',
DATABASE_NAME = 'Mydatabase',
CREDENTIAL = ElasticDBQueryCred,
) ;
CREATE EXTERNAL TABLE [dbo].[CustomerInformation]
( [CustomerID] [int] NOT NULL,
[CustomerName] [varchar](50) NOT NULL,
[Company] [varchar](50) NOT NULL)
WITH
( DATA_SOURCE = MyElasticDBQueryDataSrc)
SELECT * FROM CustomerInformation
I using the code to query the table in Mydatabase from DB1:
For more details, ref here: Get started with cross-database queries (vertical partitioning) (preview)
Hope this helps.

Resources